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Abstract Conflict, from small-scale verbal disputes to large-scale violent war 
between nations, is one of the most fundamental elements of social life and a 
central topic in social science research. The main argument of this book is that 
computational approaches have enormous potential to advance conflict research, 
e.g., by making use of the ever-growing computer processing power to model 
complex conflict dynamics, by drawing on innovative methods from simulation to 
machine learning, and by building on vast quantities of conflict-related data that 
emerge at unprecedented scale in the digital age. Our goal is (a) to demonstrate how 
such computational approaches can be used to improve our understanding of conflict 
at any scale and (b) to call for the consolidation of computational conflict research 
as a unified field of research that collectively aims to gather such insights. We first 
give an overview of how various computational approaches have already impacted 
on conflict research and then guide through the different chapters that form part of 
this book. Finally, we propose to map the field of computational conflict research 
by positioning studies in a two-dimensional space depending on the intensity of the 
analyzed conflict and the chosen computational approach. 
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2 E. Deutschmann et al. 
1 Introduction 


From small-scale, non-violent disputes to large-scale war between nations, conflict 
is a central element of social life and has captivated the collective consciousness 
for millennia. In the fifth century B.C., Greek philosopher Heraclitus famously 
argued that “war is the father and king of all" and that conflict and strife between 
opposites maintains the world (Graham, 2019). Many centuries later, sociologist 
Georg Simmel would, in a similar vein, state that a society without conflict 
is "not only impossible empirically, but it would also display no essential life- 
process and no stable structure." Social life, he posited, always "requires some 
quantitative relation of harmony and disharmony, association and dissociation, 
liking and disliking, in order to attain to a definite formation" (Simmel, 1904, 
p. 491). Many related assertions could be listed: From Marx's depiction of all past 
history as class struggle (Marx and Engels, 2002) to Dahrendorf's conflict theory, 
which put clashing interests between conflict groups at the heart of questions of 
social stability and change (Ritzer and Stepnisky, 2017), conflict is seen as the 
fundamental principle that shapes society and history: Because there is conflict, 
there is historical change and development,” as Dahrendorf (1959, p. 208) put it. 

Between societies, too, conflict has long been recognized as an essential force. In 
history and political philosophy, many of the classic works are centered on clashes 
and contentions: From Thucydides’ History of the Peloponnesian War and Caesar's 
De bello Gallico to Machiavelli'S Prince and Hobbes's Leviathan, the issue of 
violent struggle for power between cities, states, and empires of all kinds has been 
key. From psychology to international relations, conflict is one of the central fields 
of inquiry, with classic work searching for the root causes of conflict at various 
levels of analysis, from individual human predispositions and behavior to the spread 
of ideology and structural relations between states to the anarchic international 
system (Waltz, 2001; Rapoport, 1995). In short, it is hard to imagine human life 
without conflict. Rather, conflict can be seen as a "chronic condition" (Rapoport, 
1995, p. xxi) we have to live under. Consequently, it is unsurprising that efforts to 
understand conflict have been abundant. The statement that “more has been written 
on conflict than on any other subject save two: love and God" (Luce and Raiffa, 
1989, as cited in Rapoport, 1995, p. xxi) puts this impression into words.! 

While this centrality of conflict for the human condition may be justification 
enough for the continued attempts of a range of scientific fields to better understand 
conflict in its manifold forms, another central motivation is, of course, the search 
for ways to control, reduce, or even prevent conflict. "Are there ways of decreasing 
the incidence of war, or increasing the chances of peace? Can we have peace more 
often in the future than in the past?" asks, for instance, Waltz (2001, p. 1). This 


!Luce and Raiffa (1989, p. 1) actually talk specifically about “conflict of interest" and mention 
“inner struggle" as a third topic that has received “comparable attention" apart from “God” and 
“love.” Rapoport may have misremembered the exact quote or he deliberately subsumed “conflict 
of interest" and "inner struggle" under "conflict." 
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desire to contribute to a saver, less conflictual world became most urgent in the 
face of total annihilation during the Cold War. As Rapoport (1995, p. xxii) put it, 
talking about nuclear war: “understand it we must, if we want a chance of escaping 
what it threatens.” An entire field, peace science, now aims at understanding 
the conditions for conflict resolution. Improving our understanding of the causes, 
structures, mechanisms, spatio-temporal dynamics, and consequences of conflict is 
thus an important goal of social science research. 

Recently, computational social science has set out to advance social research 
by using the ever-growing computer processing power, methodological innovations, 
and the emergence of vast quantities of data in the digital age to achieve better 
knowledge about social phenomena. The central thesis of this book is that such 
computational approaches have enormous potential to advance conflict research. 
Our goal in this introductory chapter—and the book as a whole—is (a) to demon- 
strate how such computational approaches can be used to improve our understanding 
of conflict at any scale and (b) to call for the consolidation of computational conflict 
research as a field of research that collectively aims to gather such insights. 

We argue that computational conflict research, i.e., the use of computational 
approaches to study conflict, can advance conflict research through at least three 
major innovative pathways: 


1. The identification of spatio-temporal dynamics and mechanisms behind conflicts 
through simulation models that allow to track the interaction of actors in 
conflict scenarios and to understand the emergence of aggregated, macro-level 
consequences. 

2. The availability of new, fine-grained datasets of conflict events at all scales 
from the local to the global (“big data") that have become available through 
digitization together with novel techniques in the computer age to collect, store, 
and analyze such data. 

3. The combination of these simulation and other advanced computational methods 
with this vast, fine-grained empirical data. 


To demonstrate the potential of these innovations for conflict research, this book 
brings together a set of (a) chapters that discuss these advances in data availability 
and guide through some of these computational methods and (b) original studies 
that showcase how various cutting-edge computational approaches can lead to 
new insights on conflict at various geographic scales and degrees of violence. 
Following Hillmann (2007, p. 432), conflict is understood in this book as opposition, 
tensions, clashes, enmities, struggles, or fights of various intensities between social 
units. This definition is deliberately broad: Social units can range from small 
groups of individuals without formal organization to institutions with differentiated 
organizational structure to large and complex units such as entire nation-states or 
even batches of countries. Examples in this book will include street protesters, 
terrorist organizations, rebel groups, political parties, and sets of parliaments. 
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Conflict, as understood here, can be non-violent or violent.) Some studies in this 
book will deal with the former, covering social, non-violent conflicts such as clashes 
between political parties in parliamentary debates or normative shifts in social 
networks, while others will deal with the latter, including the spatial structure of civil 
war violence or the extortion mechanisms rebel groups use to exploit enterprises.? 

This chapter is structured as follows: We first give a short summary of the 
rise of computational social science for readers who are unfamiliar with this trend 
(Sect. 2). Next, we discuss how computational approaches have already enriched 
conflict research (Sect. 3). Finally, we give an overview of the contributions of this 
volume, laying out how they move the field forward. We also offer a visualization 
that allows to map the field of computational conflict research in a two-dimensional 
space (Sect. 4). 


2 TheRiseof Computational Social Science 


The modern field of computational social science has emerged at the end of the first 
decade of our century starting with a "Perspective" article in Science (Lazer et al., 
2009) mainly from scientists in North America, followed by a “Manifesto” (Conte 
et al., 2012) from scientists in Europe, leading to a great many books, conferences, 
summer schools, institutes, and novel job postings and titles all over the world in 
recent years. 

The main driver of the popularity of computational social science in the past 
10 years was clearly the new availability of large-scale digital data that humans 
now create by spending time online and by carrying mobile devices. This data 
accumulates in companies, government agencies, and on the devices of users. Just 
as a matter of business, engineers from the tech, internet, and information industries 
enabled, created, and processed increasing amounts of social and cultural data, and, 


?Gleditsch, in the chapter “Advances in Data on Conflict" of this book, will identify too narrow 
definitions of conflict that focus on violent conflict alone as one major obstacle that prevents 
progress in the field. He gives the example of recent street protests in Venezuela and argues that it 
would be “absurd” not to treat this as a conflict due to the absence of organized armed violence in 
the sense of a civil war. By taking a broad perspective and combining research on non-violent and 
violent conflict, this book aims to contribute to overcoming this issue. 


3When we use the term “violence” to classify the research conducted in this book, we use it to 
denote pAysical violence. Some definitions of violence, e.g., by the World Health Organization, 
also treat psychological and emotional violence as sub-types of violence (Krug et al., 2002). 
Sometimes verbal violence is described as another type of violence (Nieto et al., 2018). In that 
sense, words—and thus, for instance, also the analyses of parliamentary speeches analyzed in Part 
II of the book, which we titled “Computational Research on Non-Violent conflict" could also be 
seen as potentially dealing with violence, if such a broader definition of violence were used. Yet 
we deem the distinction between conflict based on physical force (violence) and that carried out 
without the use of physical force (=non-violently) meaningful and thus decided to stick to it. In 
principle, computational conflict research could of course cover any kind of violence. 
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as a consequence, they interfered more and more with large-scale societal processes 
themselves. Business development in this situation requires not only technical skills 
but also skills in the interpretation of social data, leading to the emergence of the 
jobs as data scientists or social data scientists that have combined knowledge in 
statistics, computer science, and—in particular—social sciences. 

While the “social data scientist" is to a large extent an invention of the 
business world, the term "computational social science" comes from academia— 
somewhat surprisingly, however, not so much from within the social sciences 
themselves. Many authors of the two seminal papers (Lazer et al., 2009; Conte 
et al., 2012) did not obtain their undergraduate training in a social science. They 
often conducted research in fields such as complex systems, sociophysics, network 
Science, simulation, and agent-based modeling before the term computational social 
science appeared. Research on complex systems and related fields typically aims at 
a fundamental understanding of dynamical processes with many independent actors 
(Simon, 1962). The mode of computation is usually computer simulation (Gilbert 
and Troitzsch, 2005) and the relation to data is often focused on the explanation of 
large-scale empirical regularities: the stylized facts, with fat-tailed distributions as 
the seminal example (Price, 1976; Gabaix et al., 2003). 

Due to the focus on universal mechanisms and the interdisciplinarity of contribu- 
tors also from outside of the traditional social sciences, computational social science 
represents an integrated approach to the social sciences, where the traditional social 
and behavioral sciences serve as different perspectives for modeling how people 
think (psychology), handle wealth (economics), relate to each other (sociology), 
govern themselves (political science), and create culture (anthropology) (Conte 
et al., 2012), or operate in geographical space (Torrens, 2010) to gain quantita- 
tive and qualitative insight about societal questions and real-world problems (cf. 
Watts, 2013; Keuschnigg et al., 2017). These aspects can be subsumed under the 
current, broad definition by Amaral (2017, p. 1) that understands computational 
social science as an “interdisciplinary and emerging scientific field [that] uses 
computationally methods to analyze and model social phenomena, social structures, 
and collective behavior." Following this definition, computational refers to at least 
three very different aspects of computing: the retrieval, storage, and processing 
of massive amounts of digital behavioral data; the development of algorithms for 
inference, prediction, and automated decision-making based on that data; and the 
implementation of dynamic computer models for the simulation of social processes. 

Many other disciplines have had a "computational" branch for a much longer 
time than the social sciences. There is computational physics, computational biol- 
ogy, and computational economics to name a few. Also, computational sociology 
has been formulated already in the time before the omnipresence of the internet 
and large-scale digital behavioral data, acknowledging that any modern science 
builds not only on a theoretical and an empirical, but also on a computational 
component (Hummon and Fararo, 1995). 

Today, computational social science is sometimes reduced to being a new science 
that provides methods for retrieving digital behavioral data for the analysis of 
people's social behavior online. Our perspective on the field is broader, including the 
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older simulation-based focus on fundamental mechanisms (e.g., Epstein and Axtell, 
1996; Axelrod, 1997). We see computational social science as contributing to a basic 
scientific understanding of social processes, to the development of new methods, to 
societal insights informing current political debates and international relations. In 
the following section, we discuss how computational social science, understood in 
this broad sense, has already impacted on the field of conflict research. 


3 Computational Approaches to Conflict Research 


The use of computational approaches in conflict research is not a new endeavor. A 
broad range of methods, techniques, and systems have been developed in various 
scientific fields, including the areas of machine learning, social network analysis, 
geographic information systems, and computational simulation. In the following, we 
give a short overview of these strands of research. This review should be understood 
more as a starting point rather than an end point in mapping the field. Hence, we do 
not claim completeness and interested readers are invited to explore the works cited 
in the other chapters of this book, which may serve as useful additional reference 
points. 

Machine learning methods define algorithms, or a set of step-by-step com- 
putational procedures, with the aim to find an appropriate model that describes 
non-trivial regularities in data. In conflict research, these methods have been initially 
adopted to develop predictive models of conflict outcomes (Schrodt, 1984, 1987, 
1990, 1991) and conflict mediation attempts (Trappl, 1992; Fürnkranz et al., 1994; 
Trappl et al., 1996, 1997).^ With the growing availability of detailed empirical 
data in the last decades, however, the focus has shifted from predicting outcomes 
in ongoing conflicts to derive early warning indicators of conflicts in the hope for 
preventing them (Schrodt, 1997; Beck et al., 2000; Schrodt and Gerner, 2000; Trappl 
et al., 2006; Subramanian and Stoll, 2006; Zammit-Mangion et al., 2012; Perry, 
2013; Helbing et al., 2015). Despite the latest major efforts on developing systems 
of early warning (Trappl, 2006; O'Brien, 2010; Guo et al., 2018), no system has 
established itself as a reliable tool for policy-making yet (Cederman and Weidmann, 
2017). Cederman and Weidmann (2017) identify several pitfalls and provide a 
number of recommendations on how existing work on data-driven conflict research 
can be improved. 

Most of these machine learning methods cannot only be applied to numeric data, 
but also to symbolic data (i.e., text, images, and video). These methods are sub- 
sumed under the label of computer-aided content analysis (Weber, 1984). Conflict 
research benefits from these methods by identifying the relation between particular 
actors and indicators of violence in textual data, thus supporting the analysis of, 


4A more detailed overview of the initial uses of these methods in conflict research can be found 
in Trappl and Miksch (1991). 
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e.g., online hate speech campaigns, cyber mobbing, and social media flame wars. 
Examples include analyses of collective sense-making after terrorist attacks based 
on Twitter comments (Fischer-PreBler et al., 2019) and verbal discrimination against 
African Americans in the media based on online newspaper articles (Leschke and 
Schwemmer, 2019). Further details and a survey on the use of these methods in 
conflict research are available in the chapter “Text as Data for Conflict Research" of 
this book. 

Despite the fact that conflicts are strongly influenced by network dynamics, 
machine learning methods rarely integrate these dynamics to study interaction 
between more than two groups. Social network analysis complements these methods 
and sheds light on the structural and dynamic interaction aspects of the multiple 
groups involved in conflicts (Wolfe, 2004). Social network analysis has been 
recognized to be useful for mapping groups' structure, to identify the division of 
power within these groups, and to uncover their internal dynamics, patterns of 
socialization, and the nature of their decision-making processes (Kramer, 2017). 
Hammarstr and Heldt (2002) successfully applied network analysis methods to 
study the diffusion of interstate military conflict, while Takács (2002) investigated 
the influences of segregation on the likelihood of intergroup conflicts when individ- 
uals or groups compete for scarce resources. More specifically, the descriptive and 
explanatory potential of social network analysis has been demonstrated appropriate 
to study terrorism activities and organizations (Perliger and Pedahzur, 2011; 
Deutschmann, 2016), to understand the influence of social network structure on 
the flexibility of a rebel group in peace negotiations (Lilja, 2012), and to assess 
the influence of antigovernment network structures (i.e., alliances and strategic 
interactions) in generating conflictual behavior (Metternich et al., 2013). 

Analogous to social network analysis that enhances conflict studies by inte- 
grating network dynamics, geographic information systems (GIS) offer techniques 
for refining these studies through the incorporation of spatial data into the anal- 
ysis (Branch, 2016). Although spatial relationships have often been analyzed in 
a general way in qualitative conflict research, the recent advances in computing 
power and the increasing availability of disaggregated and high-resolution spatial 
data have enabled more sophisticated and quantitative studies (Stephenne et al., 
2009; Gleditsch and Weidmann, 2012). Despite these advances, introducing the 
spatial dimension to conflict research still poses several challenges: Practical, 
because of the lack of high quality open-source GIS software tools and the lack 
of educational training in spatial methods and programming; Theoretical, related 
to different aspects of the definition of "space," such as choosing spatial units and 
the appropriate resolution for analysis, as well as the right measure of distance; and 
Statistical, because of the dependent nature of spatial units of analysis and their 
interaction with time (Stephenne et al., 2009; Gleditsch and Weidmann, 2012). 

While the previously discussed data-driven approaches are primarily focused 
on uncovering correlations in empirical data, computational simulation-based 
approaches create a bridge between theory and data that can be used to demonstrate 
causality. Computational conflict simulation models first appeared during the Cold 
War (Cioffi-Revilla and Rouleau, 2010). These models used systems of ordinary 
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differential equation or difference equations (ODE) and were implemented using 
the system dynamics approach (Forrester, 1968). Bremer and Mihalka (1977), 
following a complex systems approach, developed a discrete model composed of 
states arrayed geographically and with imperfect perception, provided each with a 
quantity of “power,” endowed each with action rules based on realistic principles, 
and set them off to interact with one another in iterative cycles of conflict and 
cooperation. This model aimed to investigate the likelihood that a power equilibrium 
can be achieved under particular conditions (Duffy and Tucker, 1995). Cusack and 
Stool (1990) extended the Bremer and Mihalka model incorporating more realistic 
rules in which states play multiple roles and they assessed the effects that different 
sets of rules have on the survival and endurance of states and state systems (Duffy 
and Tucker, 1995). In line with the complex systems approach, Axelrod (1995) 
proposed a model to understand the future of global politics through extortion and 
cooperation among states. 

Although different approaches have been used to model and investigate conflicts 
over the decades, system dynamics was the dominant one until the introduction 
of the agent-based modeling (ABM) approach (Burton et al., 2017). Cederman 
(2002) presented a series of agent-based models that trace complex macrohistorical 
transformations of actors. He argued that in addition to the advantages usually 
attributed to ABM (i.e., bounded rationality and heterogeneity of entities), this 
technique also promises to overcome the reification of actors by allowing to 
superimpose higher-level structures on a lower-level grid of primitive actors. The 
groundwork of the use of ABM in conflict research was laid by Epstein (2002), 
who analyzed the conditions under which individuals may mobilize and protest. 
He examined the complex dynamics of decentralized rebellion and interethnic civil 
violence and factors such as the legitimacy of a political system, risk-aversion of 
potential protesters, police strength, and geographic reach. Epstein's model of civil 
conflict has subsequently been extended (Ilachinski, 2004; Goh et al., 2006; Lemos 
et al., 2016; Fonoberova et al., 2018). Bhavnani et al. (2008) created an agent-based 
computational framework that incorporates factors such as ethnicity, polarization, 
dominance, and resource type, allowing the study of the relationship between natural 
resources, ethnicity, and civil war. Similarly, Cioffi-Revilla and Rouleau (2010) 
developed a model that considers how freedom of social interaction within a state 
may lead to rebellion and possibly regime change. 

Despite the long-standing use of data-driven and simulation-based computational 
approaches to conflict research, the field of computational conflict research is not 
clearly defined yet. The efforts are spread over several scientific fields that in 
their majority do not have the conflict domain as their main target, but rather 
use it as an application domain in which their methods can be applied. Hence, 
our book aims at contributing to advancing the field of computational conflict 
research through a more complete and systematic analysis of what can be done with 
computational approaches in studying conflict. In the following section, we describe 
more concretely how the contributions of this book help achieve this goal. 
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4 The Contributions of This Book 


This book covers computational conflict research in a range of facets, with its contri- 
butions using a variety of different approaches on several dimensions (methodology, 
conflict scale, geographic focus, etc.). It thereby addresses the full scope of the field, 
including primarily data-driven as well as primarily simulation-based approaches. 
The book brings together contributions from leading and emerging scholars with 
a diversity of disciplinary backgrounds from physics, mathematics, and biology to 
computer and data science to sociology and political science. The volume is also 
a truly international endeavor, with its authors’ institutional affiliations reaching 
across thirteen countries on three continents. 

Methodologically, the book covers a variety of computational approaches from 
text mining and machine learning to agent-based modeling and simulation to social 
network analysis. Table 1 gives a more fine-grained overview of the different 
methodologies and computational approaches used. 

Regarding data, several chapters make use of empirical conflict data that has only 
recently become available in such detail, be it large corpora of text or fine-grained, 


Table 1 Computational approaches used/covered in the chapters of this book 


Chapter Short title Computational approach(es) 

2 Advances in Data on Conflict N/A 

3 Text as Data for Conflict Research Machine learning 
Automated content analysis 
Topic modeling 
Text mining 

4 Relational Event Models Social network analysis 
Relational event models 

5 Migration Policy Framing Machine learning 
Topic modeling 

6 Norm Conflict in Social Networks Social network analysis 
Agent-based modeling 

7 Fate of Social Protests Agent-based modeling 

8 Non-state Armed Groups Social network analysis 
Agent-based modeling 
Markov chain Monte Carlo 

| Hawkes process 

9 Violence Against Civilians Matched wake analysis 

10 Conflict Diffusion over Continuous Space Spatial statistics 
Continuous space model 
Log-Gaussian Cox process model 

11 Rebel Group Protection Rackets Agent-based modeling 
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Fig. 1 Overview of the geographic location of the case studies contained in this book 


geo-tagged information on conflict events coded globally from news reports." 
Geographically, these case studies add up to a comprehensive set of analyses of 
recent conflicts that spans multiple continents, as the map in Fig. 1 reveals. These 
contentions range from conflict lines in parliamentary debates on migration policy 
in the USA and Canada from 1994 to 2016 (chapter “Migration Policy Framing") 
and the representation of street protest in Germany (2014-15) and Iran (2017-18) 
on social media (chapter “Fate of Social Protests") to terrorist attacks in Colombia, 
Afghanistan, and Iraq between 2001 and 2005 (chapter *Non-state Armed Groups") 
and violence against civilians in the Democratic Republic of Congo in 1998—2000 
(chapter “Violence Against Civilians") to conflict diffusion in South Sudan between 
2014 and 2018 (chapter "Conflict Diffusion over Continuous Space") and rebel 
group behavior in Somalia from 1991 to the present day (chapter “Rebel Group 
Protection Rackets"). Hence, a broad range of recent conflicts is covered. 

The book is structured in three parts. Part I focuses on data and methods 
in computational conflict research and contains three contributions. Part II deals 
with non-violent, social conflict and comprises three chapters. Part III is about 
computational approaches to violent conflict and covers four chapters. In the 
following, we give a short overview about these individual chapters. 

In the chapter “Advances in Data on Conflict,” Kristian Gleditsch, building on 
more than two decades of experience in peace and conflict studies, takes a look 
at the role of data in driving innovation in the field. He argues that the growth of 
systematic empirical data has been a central innovative force that has brought the 


For an overview of recent developments in data on conflict see the contribution by Gleditsch in 
the chapter "Advances in Data on Conflict" of this volume. 
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field forward. Drawing on several examples, he demonstrates how data has served 
as a source of theoretical innovation in the field. This progress in data availability, 
he argues, has helped generate new research agendas. His contribution ends with an 
inventory of the most valuable data sources on conflict events to date—which, we 
believe, may be highly useful for readers interested in conducting their own research 
on conflicts globally. 

In the chapter “Text as Data for Conflict Research,’ Seraphine F. Maerz and 
Cornelius Puschmann give insights into how text can be used as data for conflict 
research. Arguing that computer-aided text analysis offers exciting new possibilities 
for conflict research, they delve into computational procedures that allow to analyze 
large quantities of text, from supervised and unsupervised machine learning to 
more traditional forms of content analysis, such as dictionaries. To illustrate these 
approaches, they draw on a range of example studies that investigate conflict based 
on text material across different formats and genres. This includes both conflict 
verbalized in news media, political speeches, and other public documents and 
conflict that occurs directly within online spaces like social media platforms and 
internet forums. Finally, they highlight cross-validation as a crucial step in using 
text as data for conflict research. 

In the chapter "Relational Event Models," Laurence Brandenberger introduces 
relational event models (REMs) as a powerful tool to examine how conflicts arise 
through human interaction and how they evolve over time. Building on event history 
analysis, these models combine network dependencies with temporal dynamics and 
allow for the analysis of social influencing and group formation patterns. The added 
information on the timing of social interactions and the broader network in which 
actors are embedded can uncover meaningful social mechanisms, Brandenberger 
argues. To illustrate the added value of REMs, the chapter showcases two empirical 
studies. The first one shows that countries engaging in military actions in the Gulf 
region do so by balancing their relations, i.e., by supporting allies of their allies 
and opposing enemies of their allies. The second one shows that party family 
homophily guides parliamentary veto decisions and provides empirical evidence 
of social influencing dynamics among European parliaments. Brandenberger also 
references her R package, which allows interested conflict researchers to apply 
REMs. 

The chapter “Migration Policy Framing" opens Part II of the book with research 
on non-violent, social conflicts. Sanja Hajdinjak, Marcella H. Morris, and Tyler 
Amos put the text-as-data approach that was laid out in the chapter "Text as Data 
for Conflict Research" into empirical practice. Drawing on more than a decade of 
parliamentary speeches from the USA and Canada, they analyze how parties frame 
migration topics in political discourse. Building on work that argues that migration 
falls in a gap between established societal cleavages over which parties do not have 
robust, issue-specific ownership, Hajdinjak et al. argue that parties engage in debates 
on migration topics by diverting attention to areas in which they have established 
issue ownership. Using structural topic models, they test this assertion by measuring 
the differences in salience and framing of migration-related topics over time in the 
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debates of the lower houses of Canada and the USA. Doing so, they do indeed find 
that, in both countries, liberals frame migration differently than conservatives. 

In the chapter “Norm Conflict in Social Networks,” an interdisciplinary team 
of psychologists, sociologists, and physicists—Julian Kohne, Natalie Gallagher, 
Zeynep Melis Kirgil, Rocco Paolillo, Lars Padmos, and Fariba Karimi—model 
the spread and clash of norms in social networks. They argue that arriving at an 
overarching normative consensus in groups with different social norms can lead 
to intra- and intergroup conflict. Kohne et al. develop an agent-based model that 
allows to simulate the convergence of norms in social networks with two different 
groups in different network structures. Their model can adjust group sizes, levels 
of homophily as well as initial distribution of norms within the groups. Agents 
in the model update their norms according to the classic Granovetter threshold 
model, where a norm changes when the proportion of the agents' ego-network 
displaying a different norm exceeds the agents' threshold. Conflict, in line with 
Heider's balance theory, is operationalized by the proportion of edges between 
agents that hold a different norm in converged networks. Their results suggest 
that norm change is most likely when norms are strongly correlated with group 
membership. Heterophilic network structures, with small to middling minority 
groups, exert the most pressure on groups to conform to one another. While the 
results of these simulations demonstrate that the level of homophily determines 
the potential conflict between groups and within groups, this contribution also 
showcases the impressive possibilities of ever-increasing computing power and how 
they can be used for conflict research: Kohne et al. ran their agent-based simulation 
on a high performance computing cluster; their simulation took about 315 hours to 
complete and generated 40 Gigabytes of output data. 

Gravovetter's threshold model and the spread of information in networks also 
play a role in the chapter “Fate of Social Protests,” in which Ahmadreza Asghar- 
pourmasouleh, Masoud Fattahzadeh, Daniel Mayerhoffer, and Jan Lorenz simulate 
conditions for the emergence of social protests in an agent-based model. They draw 
on two recent historical protests from Iran and Germany to inform the modeling 
process. In their agent-based model, people, who are interconnected in networks, 
interact and exchange their concerns on a finite number of topics. They may start 
to protest either because their concern or the fraction of protesters in their social 
contacts exceeds their protest threshold, as in Granovetter's threshold model. In 
contrast to many other models of social protests, their model also studies the 
coevolution of topics of concern in the public that is not (yet) protesting. Given 
that often a small number of citizens starts a protest, its fate depends not only on 
the dynamics of social activation but also on the buildup of concern with respect 
to competing topics. Asgharpourmasouleh et al. argue that today, this buildup often 
occurs in a decentralized way through social media. Their agent-based simulation 
allows to reproduce the structural features of the evolution of the two empirical cases 
of social protests in Iran and Germany. 

In the chapter “Non-state Armed Groups,” an interdisciplinary team with back- 
grounds in data science, philosophy, biology, and political science— Simone Cre- 
maschi, Baris Kirdemir, Juan Masullo, Adam R. Pah, Nicolas Payette, and Rithvik 
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Yarlagadda—look at the network structure of non-state armed groups (NSAGs) 
in Colombia, Iraq, and Afghanistan from 2001 to 2005. They use a self-exciting 
temporal model to ask if the behavior of one NSAG affects the behavior of other 
groups operating in the same country and if the actions of groups with actual ties 
(i.e., groups with some recognized relationship) have a larger effect than those with 
environmental ties (i.e., groups simply operating in the same country). The team 
finds mixed results for the notion that the actions of one NSAG influence the actions 
of others operating in the same conflict. In Iraq and Afghanistan, they find evidence 
that NSAG actions do influence the timing of attacks by other NSAGs; however, 
there is no discernible link between NSAG actions and the timing of attacks in 
Colombia. However, they do consistently find that there is no significant difference 
between the effects that actual or environmental ties could have in these three cases. 

In the chapter “Violence Against Civilians,” political scientists Andrea Salvi, 
Mark Williamson, and Jessica Draper examine why some conflict zones exhibit 
more violence against civilians than others. They assess that past research has 
emphasized ethnic fractionalization, territorial control, and strategic incentives, 
but overlooked the consequences of armed conflict itself. This oversight, Salvi et 
al. argue, is partly due to the methodological hurdles of finding an appropriate 
counterfactual for observed battle events. In their contribution, they aim to test 
empirically the effect of instances of armed clashes between rebels and the 
government in civil wars on violence against civilians. Battles between belligerents 
may create conditions that lead to surges in civilian killings as combatants seek to 
consolidate civilian control or inflict punishment against populations residing near 
areas of contestation. Since there is no relevant counterfactual for these battles, they 
utilize road networks to help build a synthetic risk-set of plausible locations for 
conflict. Road networks are crucial for the logistical operations of a civil war and are 
thus the main conduit for conflict diffusion. As such, the majority of battles should 
take place in the proximity of road networks; by simulating events in the same 
geographic area, Salvi et al. are able to better approximate locations where battles 
hypothetically could have occurred, but did not. They test this simulation approach 
using a case study of the Democratic Republic of the Congo (1998-2000) and 
model the causal effect of battles using a spatially disaggregated framework. Their 
work contributes to the literature on civil war violence by offering a framework for 
crafting synthetic counterfactuals with event data, and by proposing an empirical 
test for explaining the variation of violence against civilians as a result of battle 
events. 

In the chapter “Conflict diffusion over Continuous Space,” statistician Claire 
Kelling and political scientist YiJyun Lin study the diffusion of conflict events 
through an innovative application of methods of spatial statistics. They investigate 
how spatial interdependencies between conflict events vary depending on several 
attributes of the events and actors involved. Kelling and Lin build on the fact— 
similarly observed by Gleditsch in the chapter “Advances in Data on Conflict”—that 
due to recent technological advances, conflict events can now be analyzed using data 
measured at the event level, rather than relying on aggregated units. Looking at the 
case of South Sudan, they demonstrate how the intensity function defined by the 
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log-Gaussian Cox process model can be used to explore the complex underlying 
diffusion mechanism under various characteristics of conflict events. Their findings 
add to the explanation of the process of conflict diffusion, e.g., by revealing that 
battles with territorial gains for one side tend to diffuse over larger distances than 
battles with no territorial change, and that conflicts with longer duration exhibit 
stronger spatial dependence. 

In the chapter “Rebel Group Protection Rackets,’ Frances Duffy, Kamil C. 
Klosek, Luis G. Nardin, and Gerd Wagner present an agent-based model that 
simulates how rebel groups compete for territory and how they extort local 
enterprises to finance their endeavors. In this model, rebel groups engage in a 
series of economic transactions with the local population during a civil war. These 
interactions resemble those of a protection racket, in which aspiring governing 
groups extort the local economic actors to fund their fighting activities and control 
the territory. Seeking security in this unstable political environment, these economic 
actors may decide to flee or to pay the rebels in order to ensure their own protection, 
impacting the outcomes of the civil war. The model reveals mechanisms that are 
helpful for understanding violence outcomes in civil wars, and the conditions that 
may lead certain rebel groups to prevail. By simulating several different scenarios, 
Duffy et al. demonstrate the impact that different security factors have on civil war 
dynamics. Using Somalia as a case study, they also assess the importance of rebel 
groups’ economic bases of support in a real-world setting. 

The agent-based simulation models constructed in several of these chapters are 
all available online. They can be downloaded or applied directly in the web browser. 
Interested readers can thus replicate the outcomes presented in this book, adjust 
parameters, and build on the code to advance in their own research. An overview of 
this online material is available at the end of this book, together with information on 
further supplementary material, such as replication files and links to the data sources 
used. 

Figure 2 shows how the chapters that are based on empirical studies (i.e., Parts 
II and III of the book) can be placed on a two-dimensional space that can be 
interpreted as representing the field of computational conflict research. In this 
space, the vertical axis describes the intensity of the conflict studied, running 
from “non-violent” to “violent.” The horizontal axis describes the computational 
approach that is used, ranging from “simulation-based” to “data-driven.” As can 
be seen, the book at hand covers all four quadrants that constitute the field. The 
chapter “Norm Conflict in Social Networks,” for example, where the interaction of 
actors with different social norms is studied, is an example of computational conflict 
research that is based entirely on simulations and that deals with non-violent, social 
conflict. The chapter “Migration Policy Framing,” in which party differences in 
parliamentary debates are analyzed, also deals with non-violent conflict, but is 
mostly data-driven. In the upper left quadrant, we see the chapter “Rebel Group 
Protection Rackets,” which, with its agent-based model on rebel group protection 
rackets, deals with violent conflict and is mainly simulation-based, although some 
parameters are adjusted according to the real-world case of Somalia as mentioned 
above (accordingly, it is placed somewhat towards the center of the horizontal axis, 
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Fig. 2 Positioning the contributions of this book in a two-dimensional space that forms the field 
of computational conflict research 


denoting a mix of both simulation-based and data-driven approaches). Finally, in 
the upper right corner, we see, for instance, the chapter “Conflict Diffusion over 
Continuous Space,’ which deals with the diffusion of violent conflict events (e.g., 
battles) in continuous space and is mainly data-driven. The chapter “Non-state 
Armed Groups" uses both a large-scale dataset and draws on simulation techniques 
and is thus placed toward the center of the horizontal axis. 

Although this two-dimensional representation is of course quite simple—and 
perhaps even simplistic—it should in theory be possible to place any research 
conducted in the field of computational conflict research—including the studies 
discussed in Sect. 3—somewhere in this space. We thus hope this representation 
may prove to become a useful heuristic for the field. 

By bringing together novel research by an international team of scholars from a 
range of fields, this book strives to contribute to consolidating the emerging field 
of computational conflict research. It aims to be a valuable resource for students, 
scholars, and a general audience interested in the prospects of using computational 
social science to advance our understanding of conflict dynamics in all their facets. 
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Abstract In this chapter, I review the role of data in driving innovation in research 
on conflict. I argue that progress in conflict research has been strongly related to the 
growth of systematic empirical data. I draw on a series of examples to show how data 
have served as a source of theoretical innovation. I discuss early models of conflict 
distributions and their enduring relevance in current discussion of conflict trends and 
the evidence for a decline in violence. I consider the interaction between theoretical 
models of conflict and empirical analysis of interstate conflict, as well as the rapid 
growth in disaggregated studies of civil war and developments in data innovation, 
which in turn help generate new research agendas. I conclude with some thoughts on 
key unresolved problems in current conflict research, namely the lack of attention to 
incompatibilities as the defining characteristics of conflict and accounting for scale 
and differences in event size. 


Keywords Conflict - Data - Models - Distributions - Progress 


1 Introduction: The Need for Data in Computational 
Social Science 


Conflict research has a long history, where efforts to record or measure conflict 
have a central place, but computational approaches to date have been less common. 
There are some notable exceptions that clearly demonstrate how computational 
approaches can be very useful in order to more explicitly explore counterfactuals 
and variation beyond what is available to us through the historical record (Bremer 
and Mihalka 1977; Cederman 1997). However, computational approaches tend 
to be the most compelling and effective when they are closely integrated with 
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subject-specific theoretical puzzles and informed by empirical data. In this chapter, 
I review the role of data in driving innovation in research on conflict. 

The thesis I advance is that innovations in research on conflict often have 
followed new data developments. My argument is not that we can simply substitute 
theory with more data. Indeed, descriptive data rarely speak for themselves, and 
new and more detailed data will by themselves rarely lead to new theoretical 
breakthroughs. Indeed, the best data sources are usually based on solid initial 
theoretical foundations that guide the data collection efforts. However, it is difficult 
to find good examples where pure theory has had a major transformative impact 
on conflict research, in the absence of substantial engagement with empirical data. 
By contrast, data innovations have often helped restate and refine existing research 
agendas, and open new avenues for theoretical development. 

To justify this thesis, I draw on a series of examples of how data have served as a 
source of theoretical innovation, starting with early models of conflict distributions 
and their enduring relevance in current discussion of conflict trends, and then 
how more recent developments in data innovation contribute to new research 
agendas. I conclude with some thoughts on what I see as particularly important 
unresolved problems in current conflict research, namely the lack of attention to 
incompatibilities as the defining characteristics of conflict and accounting for scale 
and differences in event size. 


2 Conflict Research and the Impact of the Early 
Conflict Data 


If we define data rather widely as any empirical observations, then there is of course 
a long history of data in terms of detailed historical accounts of individual conflicts. 
Many of these could be highly analytical, as Thucydides' (2000) discussion of the 
causes of Peloponnesian war (believed to be written around 410 BCE). However, 
historical accounts tend to be highly case-specific and are rarely comparative or 
systematic, in the sense of trying to cover a population of conflict or focus on 
representative cases. Moreover, outside historical accounts, much of the general 
early research on conflict focused heavily on theory and analyzing conflict in an 
abstract manner, often detached from descriptive data altogether. Hobbes (1651, p. 
78), for example, argued that scholars should try to identify the general conditions 
that make war possible rather than individual events, just as “foul weather is not 
based on isolated showers, but inclination to rain." This is in many ways a quite 
sophisticated anticipation of security dilemmas and efforts to develop more general 
theory. However, the lack of attention to data and observations also moved us 
further away from efforts to quantify risk, such as assessing how frequent conflict 
actually is and how much variation in inclination we see across specific types of 
conditions. Kelvin (1883) famously equated the quality of science to quantification. 
Without measuring conflict, we are often left without realistic assessments of risk. 
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A statement indicating that something “is possible" tells us little more than that 
probability is above 0 or impossible but less than 1 and certain. Harking back to 
the weather analogy, the nature and shape of weather distributions certainly play a 
central place in meteorology. Examining descriptive data on such distributions can 
help us keep track of how some places have more foul weather than others, and 
provides a basis for evaluating the possible causes why. 

Against this more stringent yardstick, comparative data on conflict are a relative 
recent development in the long history of conflict research. It remained until 
recently largely a fringe activity, perhaps in part as a result of policy orientation 
and aversion to statistics and quantitative methods among many traditional security 
studies scholars (see, e.g., Fazal 2016). One of the earliest datasets was collected 
by a sociologist, Sorokin (1957[1937]), who sought to use data to test his theory 
of conflict as a result of value divergence. With comprehensive information on the 
dates on key battles and troop sizes since antiquity, Sorokin's data were a major 
achievement. However, some features also limited their applicability. As the data 
were restricted to conflict between major powers, they could not speak to conflicts 
within states or conflicts with smaller powers. There is also no clear delineation of 
what makes states major powers, and a risk of circularity if influence for conflict is 
defined based on whether states tend to fight more. 

Wright (1942/1965) developed another influential dataset, intended to test a 
theory of peace as a result of active interstate organization and coordination that 
served to constraint possible factors that may lead to conflict if left unchecked. 
Although these data cover a shorter period than Sorokin's, they also included a more 
comprehensive delineation of states involved in conflict. Wright also devoted a great 
deal of attention to developing clear inclusion criteria for the data collection efforts. 
Given his background in law, it is perhaps not surprising that the definitions were 
skewed towards legal conceptions of war, but his efforts and structured approach 
had a major influence on subsequent efforts to define war. 

The most unusual data pioneer was Richardson (1960), a physicist who sought 
to identify a dataset of violent events to assist with more fundamental mathematical 
and statistical models of conflict. Richardson started to work collecting conflict data 
after World War I, but the data were not published until much later. Richardson's 
unit of analysis was deadly quarrels, based only on observable deaths. The incidents 
were classified by their severity in terms of fatalities, binned by “orders-of- 
magnitude" on a log10 scale. The data were intended to be exhaustive for events 
above 1.5 (about 32 fatalities). Richardson provides an important first discussion 
of some of the problems in counting wars from historical records—who are the 
combatants, when did a war start/end, how many died? In a pithy quote, Richardson 
(1960, p. 35) concluded that “thinginess fails" when we try to create data on wars as 
events, and “the concept of a war as a discrete thing does not quite fit all the facts.” 
Moreover, he was the first to explicitly use randomization to consider the sensitivity 
of his conclusions to decisions about lumping together events as a single war versus 
splitting episodes within longer wars. 

One of the first conflict distribution models analyzed by Richardson (1948) 
considered the severity and frequency of conflict. He noted that there was a regular 
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Fig. 1 Quarrel frequency and severity, from Richardson (1948). Richardson's data are binned by 
severity, hence the horizontal lines 


relationship between conflict severity and frequency, where the severity of a conflict 
in terms of number of people killed x is inversely proportional to frequency. More 
formally, the frequency of a conflict of severity x scales as P(x) x x ?', where a © 2. 
Richardson’s data are displayed in Fig. 1, and provide one of the first empirical 
examples of a power law. One of the properties is that multiplying severity by a 
given factor yields a proportional division of the frequency. For example, doubling 
severity halves frequency. Power laws will appear as roughly linear if displayed on 
doubly logarithmic axes. 

As shown below in Fig. 2, we find a similar relationship for other conflict 
data sources as well, including more recent data on interstate wars. Indeed, this 
relationship turns out to be a common feature of many conflict data distributions, 
including more fine-grained data on individual terrorist attacks (Bohorquez et al. 
2009; Cederman 2003; Clauset et al. 2007). However, it is not universal, and it does 
not hold for all types of conflict. As can be seen in Fig. 2, the fit is much less 
compelling for civil wars, where we see “too few” severe conflicts in the tail for the 
observed data to fit well with what we would expect under a power-law distribution. 

Skeptics may wonder why this should be regarded as an interesting finding. One 
way these results can be useful is to assess the expected frequencies of specific types 
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Fig. 2 Frequency-severity (i.e., casualty) distribution for wars, based on the expanded war data 
from Gleditsch (2004), doubly logarithmic scale 


of events. For example, 9/11 is often portrayed as being an unprecedented “Black 
Swan” event, following the terminology of Taleb (2007). Clauset and Woodard 
(2013) show that the likelihood of observing an event with the same magnitude 
as 9/11 since 1968 based on the observed data is as high as 11-35%, depending on 
the specific assumptions used. The fact that tail events are more likely than many 
anticipate based on the apparent “typical” conflict is a stark reminder of how major 
conflicts such as World War I can emerge, even when observers see “no clouds on 
the horizon." Furthermore, finding that observed events do not fit a power law can 
also be useful to think about possible causes. For example, the poor fit for civil wars 
suggests that there must be some limiting factors that may prevent civil wars from 
escalating to more severe conflicts at the same rate as interstate wars (Miranda et al. 
2016). For example, non-state actors may have limited resources to increase the war 
effort or hard constraints on their ability to escalate conflict beyond a certain level. 
A second model considered by Richardson pertains to timing of wars. Richardson 
(1960) found that outbreaks by year were consistent with a Poisson process, a 
common model for independent random events, where “there is a constant very 
small probability of an outbreak of war somewhere on the globe on every day" 
(p. 243). More formally, the number of wars n in an interval such as a year, given a 
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probability of a war breaking out p, will be e^? E. Given this formula, as long as 
p is small, we are most likely to see years without onset, followed by years with a 
single onset, and the likelihood of seeing a year with n (or more) wars falls quickly 
the higher the value of n. The idea that conflict outbreaks are random does not 
sit easily with traditional theories of international conflict. However, many other 
analyses have found that it is very difficult to reject the simple Poisson model for 
common conflict data sources (Gleditsch and Clauset 2018; Mansfield 1988). 

Again, skeptics may wonder why this analysis is relevant. The Richardson model 
of timing has become important again in the recent debate over trends in conflict. 
There has been a great deal of research on the apparent decline in warfare and 
organized violence, especially after the Cold War. Prominent books by Goldstein 
(2011) and Pinker (2011), for example, show an observed decline in the number 
of wars and the number of people killed in war and discuss possible causes. Most 
people accept that the observed data indeed indicate a decline (see, e.g., Gleditsch 
and Clauset 2018). However, there is more controversy over whether the observed 
data provide strong evidence for a trend, or shifts in the underlying distribution of 
conflict. How do we know that we have not had a spell of good luck, and how 
confident are we that the number of conflicts would remain low? Under just a 
slightly different turn of events, for example, the Cuban Missile crisis could have 
escalated to a severe conflict (Gaddis 2005). Whether we deem trends to endure is 
of course to a large extent a question on theory, and here I will focus mainly on the 
statistical aspects of assessing trends. We are used to seeing the historical record 
as a population, and many find it odd to discuss alternative worlds (Tetlock 1999; 
Tetlock and Belkin 1996). However, if we think of conflict outbreak as a stochastic 
process, then it is entirely possible to see a decline of conflict over a period, even if 
there is no change in underlying frequency of conflict. 

Whether we can reject a model of no change based on the independent outbreak 
and power-law distributions is explored recently in two papers by Cirillo and Taleb 
(2016) and Clauset (2018). Although there are a number of innovations in analysis 
and data compared to Richardson, they both consider variants of the timing and 
frequency-severity models that we have seen. In brief, Cirillo and Taleb argue that 
we cannot in principle say anything about trends since severe conflicts are so rare. 
They calculate that for a conflict with five million casualties, the expected waiting 
time between conflicts would be over 93 years. Based on this, one might argue 
that one cannot make any conclusions about notable trends just from observing a 
decline. Clauset also tries to test for evidence of shifts in the distribution after 1945. 
He finds some evidence that the most severe conflicts may be less common, but not 
sufficiently strong evidence to reject the no change null hypothesis. Oddly enough, 
changes such as nuclear weapons, the growth of the number of states, and all types 
of nonstationary factors we think influence war, such as democracy and trade, appear 
to have had no impact on the distribution of conflict. 

Other scholars have started to examine a broader range of conflicts at the 
lower end of the distribution, and whether there is evidence of changes in the 
distribution more recently than 1945 (the only period considered by Clauset), using 
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change-point detection techniques. For example, Hjort (2018) finds evidence for a 
break in the distribution in 1965, which coincided with the opposition to the Vietnam 
War and the hippie movement, so perhaps Woodstock had a longer legacy. Focusing 
on ethnic civil war, Cederman et al. (2017) find evidence for a change point in the 
series in the late 1990s, and also provide evidence that the change appears to be 
due to greater ethnic accommodation. Just as civil wars can be promoted by ethnic 
exclusion, we are less likely to see an onset of conflicts after changes towards ethnic 
accommodation and more likely to see conflict termination. 


3 Data and Progress in Conflict Research 


There have of course been many other important developments in conflict research 
beyond research on trends. However, one might perhaps also contend that the 
extent of progress has not been proportional to effort, or at least it has been 
more limited than the very high aspirations. There has been a great deal of path 
dependence, where existing data are simply duplicated, without innovation and 
further refinements. For a long time, there was a dominant tendency to let often 
ill-defined traditional theories of conflict guide empirical inquiry, and much ink 
has been spilled on investigating vague notions from the realist school of thought, 
suggesting that conflict must be some kind of function of the distribution of power 
across states in the system (Singer 1980). Many analyses have sampled on the 
dependent variable and just looked at conflict cases, without considering non-war 
cases or explicit baseline models (Most and Starr 1982). 

However, there has undeniably also been a great deal of progress, and much 
of this has been driven by data developments interacting with theory development 
(Gleditsch et al. 2014). For example, the early efforts to come up with more explicit 
list of states allowed defining populations of potential actors, and to derive better 
explicit models of the opportunities for conflict among individual states or dyads 
(Bremer 1992). Data on the geography of states has similarly led to a great deal of 
interesting research on the role of borders, distance, and conflict (Starr and Most 
1983). Data on political institutions and economic exchange helped spur the wave 
of research on liberal peace, or the possible restraining effects of institutions or 
interdependence on the use of force (Oneal and Russett 2001; Simowitz 1996). 
This has in turn led to new interest in using network approaches to understand how 
individual states are embedded in larger networks of interdependence beyond the 
dyad, as well as new methods for dealing with temporal and spatial interdependence 
in statistical analyses (Beck et al. 1998; Kinne 2009). Van Holt et al. (2016) conduct 
a more formal analysis of scientific influence in conflict research based on citation 
patterns. Their findings are visualized in terms of paths between influential articles 
and common topics in Fig. 3. It is clear from Fig. 3 that many of the influential 
articles in the graph on interstate conflict are precisely those that describe new 
dataset or analytical methods. Notable examples include Jaggers and Gurr (1995, 
introducing the Polity democracy data prominent in studies of the democratic 
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Fig. 3 Critical path and scientific influence in conflict science, reproduced from Van Holt et al. 
(2016) 


peace) and Bremer (1992), which was one of the first articles to propose systematic 
approaches to dyadic analysis of the onset of Militarized Interstate Disputes (Jones 
et al. 1996). The article presenting the first version of the extended backdated 
version of the Uppsala/PRIO armed conflict data prior to 1990 stands out as central 
in the upper right section of the figure on interstate conflict (Gleditsch et al. 2002). 
The specific topics of interest have clearly changed over time. It is notable that 
the entries in the section of the graph for intrastate conflict in Fig. 3 have much more 
recent publication dates, and the history of research on civil war differs notably from 
interstate conflict research. In general, quantitative research on civil war suffered 
much less from a legacy of traditional theories. In the mid-2000s, there was lot of 
interest in trying to develop more disaggregated data on civil war, in part promoted 
by a collaborative network of conflict research in Europe which generated a special 
issue of the Journal of Conflict Resolution (Cederman and Gleditsch 2009). We have 
seen the development of new data that disaggregate and identify the specific actors 
involved in conflict (Cunningham et al. 2009), identify more detailed information 
on specific attributes such as the ties of actors to ethnic groups and more detailed 
information on ethnic groups (Vogt et al. 2015), and data that provide more detailed 
information on events within conflicts and their geographical location (Raleigh et al. 
2010; Sundberg and Melander 2013). There have also been a number of utilities 
develop to combining different data sources, such as the geo-spatial cell structure in 
PRIO-GRID (Tollefsen et al. 2012) and the R package MELTT to match different 
event data sources by location, time and type (Donnay et al. 2019). Moreover, there 
has been a great deal of progress in automated coding of information from text 
sources such as news media reports, which provides an opportunity for real-time 
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monitoring of events (Gerner et al. 1994, Schrodt and Gerner 1994, see also Maerz 
and Puschmann, chapter “Text as Data for Conflict Research: A Literature Survey” 
in this volume). There has also been more attention to out-of-sample forecasting as a 
better approach to model and theory evaluation (Ward et al. 2010). Importantly, out- 
of-sample evaluation can help guard against the problem of in-sample overfitting, 
since it will often be the case that increasingly more complex models that may fit 
the estimation data well will perform worse out of sample than simpler models. In 
short, the study of civil war from the mid-2000s has been a period of rapid progress, 
and much of the progress has clearly been promoted by the development of new 
data sources and the interaction between theory and data. 


4 The Essential Interaction Between Theory 
and Data in Conflict Research 


Although more data have helped take us further, the interaction with theory remains 
essential. Whereas the development of data tended to follow theories or initial ideas 
in the early development of conflict research, it is now increasingly common to see 
more purely data-driven projects, exploiting the vast amount of available data on 
conflict. Exploratory analysis can often be very helpful and illuminating in its own 
right, especially if it is guided by new methods that may have advantages over the 
existing approaches that have commonly been used and help illuminate new aspects. 
Yet, there are also many cases where we arguably learn less from the analysis 
conducted, even if they are very competently done from a technical perspective. 
For example, Zammit-Mangion et al. (2012) use models from geostatistics to 
model high resolution data on events in Afghanistan, obtained from Wikileaks, 
on a database of Significant Activities (SIGACTS) compiled by the US Army. 
They argue that this framework can be helpful for detecting and predicting conflict 
dynamics such as diffusion and relocation. The model seems to have high predictive 
ability, but on closer inspection it becomes clear that much of the heavy lifting in the 
predictive ability is done by the temporal lags. There is also a discernible “ring” in 
the spatial forecasts of location, which appears to reflect how improvised explosive 
devices tend to be placed around the Highway 1/Ring Road that circles the country. 
Ultimately, the model has limited content on the motivation of the actors, and the 
framework deemphasizes conflict as interaction between antagonists. Moreover, 
since the SIGACTS data primarily record events by actors perceived as hostile by 
the US Army, these data do not contain information on the events and actions by 
coalition forces that we would need to actually study the interaction between the 
parties and how the conflict evolves as a result of this (Weidmann and Salehyan 
2013). Although data can be powerful tools to evaluate and extend theories, we need 
to avoid putting the data cart in front of the horse, or we risk developing ‘weapons 
of mass distraction’ that provide limited insights, no matter how much they appear 
to be scientific. 
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In closing I would like to flag two important problems in conflict research that I think 
have not received sufficient attention and remain difficult to consider in existing data 
sources. The first is the tendency to equate conflict exclusively with violent events, 
which is very widespread in applied research on conflict. This is not consistent 
with definitions of conflict that tend to highlight incompatibilities or conflict of 
interest between actors. Boulding (1963, p. 5), for example, suggests that "[c]onflict 
may be defined as a situation of competition in which the parties are aware of 
the incompatibility of potential future positions, and in which each party wishes 
to occupy a position that is incompatible with the wishes of the other". From this 
perspective, conflict as an incompatibility could motivate the use of violence, but 
violence in and of itself is not a defining characteristic of conflict (see also chapters 
"Advancing Conflict Research Through Computational Approaches"; "Migration 
Policy Framing in Political Discourse: Evidence from Canada and the USA"; “The 
Role of Network Structure and Initial Group Norm Distributions in Norm Conflict"; 
“On the Fate of Protests: Dynamics of Social Activation and Topic Selection Online 
and in the Streets" of this volume). The requirement that conflict must be perceived 
by the actors help to demarcate from other very expanded definitions of conflict, 
such as structural violence that extend the concept of conflict to situations with 
"objective" interest not necessarily experienced or understood by the actors (Hgivik 
and Galtung 1971). Most and Starr (1983) provide a comprehensive review of other 
definitions of conflict, most of which have a similar emphasis on conflict of interest 
as opposed to violent action. 

The tendency to equate conflict with manifestations of organized violence has led 
some researchers to either explicitly or implicitly treat situations without conflict as 
"peace." This is highly problematic, since we fail to distinguish cases where there 
are no objective conflicts of interest between actors and cases where conflicts of 
interest exist, yet do not result in the use of violence. Organized violence requires 
collective action, and all forms of efforts to initiate collective action may fail for a 
number of reasons (Sandler 1992). Even when actors have common interests on an 
issue and would benefit from a change, such as fostering regime change or replacing 
a government, they do not necessarily have sufficient private incentives to participate 
in dangerous activities. As such, there will be a temptation to free ride as the benefits 
of successful dissent would be public and cannot easily be restricted to active 
participants (Lichbach 1995; Tullock 1971). Moreover, states can deter or raise 
the costs of collective action by sanctions or retribution. But more fundamentally, 
conflict may also be waged using means other than violence, including for example 
demonstrations and strikes (see also chapter “On the Fate of Protests: Dynamics of 
Social Activation and Topic Selection Online and in the Streets" in this volume). 
Sharp (1973) and Chenoweth and Stephan (2011) document many instances of 
important campaigns waged using only non-violent means. Violent and non-violent 
tactics can be plausible substitutes, where we may not see organized violence used 
in a conflict because an actor has a comparative advantage in non-violent forms of 
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contention. For example, over the last couple of years, Venezuela has seen massive 
mobilization against the Maduro government and proposed institutional changes. 
On 19 March 2017, a so-called Mother of all Marches of protest mobilized as 
many as six million participants nationwide, according to estimates by the survey 
company Meganálisis based on traffic flow and demonstration movement data, an 
extreme relative level of mobilization in a country of just over 30 million inhabitants 
(see Lugo-Galicia 2017). Although there have been many instances of violence 
against protestors as well as occasional violent responses by protestors and riots, 
we do not have a conventional civil war in the sense of organized armed violence by 
opposition. Yet, it would be absurd to characterize this as "not a conflict" since we 
do not see organized violence. 

Many studies of civil war have tried to identify potential incompatibilities 
by focusing on the political and economic status of ethnic groups. From this 
perspective, all ethnic groups that are disadvantaged in a given state could be seen 
as potential conflict situations where there exist plausible grievances against the 
state and motives for dissent. Yet, conflict is a much more general concept than this. 
First, many violent conflicts are not ethnic, and the share of violent conflicts that are 
clearly ethnic has arguably fallen. Figure 4 displays the share of ongoing armed civil 
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conflicts in the Uppsala/PRIO Armed Conflict Data that are deemed to be ethnic, 
based on the ACD2EPR project linking actors in ongoing armed conflict to ethnic 
groups in the Ethnic Power Relations data, based on whether organizations make 
links on behalf of ethnic groups. As can be seen, historically it has been the case that 
the majority of armed violent conflicts that could be considered ethnic. However, the 
proportion has recently fallen, and is now less than 5096. One important possible 
explanation here is that ethnic civil wars have declined precisely since we see less 
of the ethnic discrimination and exclusion that promotes violence. Cederman et al. 
(2017) provide evidence that countries with changes toward greater accommodation 
and inclusion generally have lower rates of subsequent onset and higher likelihood 
of termination in ongoing conflict than countries that have not seen changes. 

If we wish to study potential conflict outside the ethnic realm, the limitations of 
focusing only on violence become more apparent. The large non-violent campaigns 
reported in existing data sources tend to be non-sectarian campaigns against author- 
itarian rulers (Cunningham et al. 2017; White et al. 2015). There are few instances 
of large-scale direct action involving ethnic groups, although many ethnic groups 
have relied on various non-violent tactics that do not involve mass mobilization 
(Cunningham 2013). One might speculate that mass mobilization is more likely 
to be successful if it can overcome other divisions in a population, as seen in 
Syria. As such, non-violent forms of contention may be generally less likely to 
be successful for secessionist aims, and it may be more likely that actors resort 
to violence precisely when non-violent tactics are the least likely to be effective. 
Testing these conjectures is very difficult to do adequately with existing data, since 
they are limited either to violent conflicts or events or only large-scale mobilization. 
The need to develop better data of incompatibilities and mobilization over these 
defined independently of the use of violence is one of the major unresolved issues 
in current conflict research. 

Another problem is related to the problem of scale of conflict. Dissent by non- 
state actors by definition must involve collective action, but the actual level of 
participation varies dramatically. Yet, many analyses just count events without 
identifying participation explicitly. For violent conflict, the scale of the conflict is 
often equated with the number of battle deaths. However, the number of battle deaths 
is not necessarily a good indicator of participation. For example, one can imagine 
that a conflict event could be brought to a halt when one antagonist mobilizes 
superior forces and successfully deters the opponent. Arguably, the number of 
casualties following the Warsaw Pact invasion of Czechoslovakia in 1968 was 
limited precisely because Czech First Secretary Dubcek ordered people not to resist 
the superior invading military forces (see Fraňková 2017). 

More generally, participation is an essential feature of interest in its own right, 
and arguably key to the outcome of the contentious events. There is considerable evi- 
dence that activists and organizations seeking to mobilize in dissent see maximizing 
participation as one of their key objectives—in the words of Popovic (2015, p. 52) 
“in a nonviolent struggle, the only weapon that you're going to have is numbers". 
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A standard approach in common research is to count numbers of reported events 
as a measure of the magnitude of events. The idea is that a situation where we 
have a higher number of reported events is experiencing a more extensive and 
significant conflict episode. Although it may well be correct that a conflict with 
more extensive reporting sees more events, but there is no necessary theoretical or 
empirical relationship between the number of events and the extent of participation 
in them. For example, a group that can mobilize a very large number may focus 
on a single large event, while groups that can only mobilize smaller numbers may 
carry out many smaller-scale events by the same participants. By counting events, 
we may erroneously conclude that the latter is a "larger" conflict, even if it involves 
fewer participants. These examples are not contrived hypothetical examples, but 
reflect real concerns. Biggs (2018) examines the relationship between the number 
of strikes and actual participants in strikes, using data from the USA, and shows 
that the two measures are not highly correlated. Many discussions of event data 
have been very concerned about the possible selection biases or the problem that 
smaller events may be omitted due to for example media biases (Weidmann 2016). 
However, if we scale events by size then it should be easier to get differences in 
orders of magnitude right, even if there is uncertainty, and we are generally less 
concerned about the influence of possible noise at the very low end. 

In addition to the theoretical problems in using event counts as measure of scale, 
there are also a number of practical issues arising in delineating what constitutes 
one "event" as opposed to two or more "separate" events. Many event data projects 
use different types of “deduplication” efforts to determine whether different reports 
are to the same or different events, typically considering events to be “the same” if 
they fall on the same date. However, there is no guarantee that this will work, and 
it is often the case that report dates may be ambiguous with respect to the day of 
reporting and the day the events described occurred. Even worse, there is plausibly 
an inverse relationship between size and granularity in some data sources. The 
Social Conflict Analysis Data (SCAD) provides a much used event dataset, which 
extends data on exclusively violent conflict and data limited to large-scale organized 
non-violent campaigns by providing more detailed event data on social conflict as 
well as geographical location information (Salehyan et al. 2012). However, many 
events in the SCAD data are coded as nationwide, where the number of simultaneous 
events across a country is deemed to be so large that coders can no longer identify 
exhaustively all the individual events. The nationwide events are likely to have 
more participants than smaller events that are easy to identify as discrete events, 
yet analysts counting events may count the former as “less significant” since it is 
reflected in fewer event counts. 

I think these are genuine problems, but in keeping with the theme of theory- 
data interaction here, I am also relatively optimistic about our ability to find useful 
approaches to overcome them. With regards to identifying incompatibilities, there 
is much that can be done to identify conflict constellations using methods such 
as expert surveys or automated content analyses. For example, recent work on 
conflict prediction using topic modelling suggests that it may be possible to identify 
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anti-government claims in news media sources (Mueller and Rauh 2018). Similar 
types of content analysis techniques could be useful for identifying cleavages or 
contention more generally, separately from violence or large-scale mobilization. 
With regards to counting participation, we also have active developments of 
alternative coding approaches, using photos where we can assess the density or 
social media data such as twitter via geolocation (Barberá et al. 2015; González- 
Bailón and Wang 2016; Won et al. 2017). It turns out again that since many things 
scale, we can use proportionality measure to infer participation. Steinert-Threlkeld 
(2018) notes that this tracks participation well in so-called women's marches in the 
USA. 


6 Conclusion 


In this chapter I have reviewed some examples of the interaction between data 
development and theoretical progress in the field of conflict research. I hope that 
I have successfully shown that data in some cases may have preceded theory, but 
in most cases data have been collected and developed in direct response to initial 
theoretical beliefs and hunches. However, the availability of data has often led to 
theoretical re-evaluations and progress; initial hunches may not be fully supported, 
while other findings lead to new puzzles or research questions. I hope this overview 
can give some sense of the excitement that I am left with over the progressive nature 
of interaction between theory and data in conflict research and the maturity of the 
field. Future central data resources are likely to come from new technologies or 
sources that have been difficult to use in the past. For example, satellite images are 
now readily available, and also relatively easy to analyze on a standard computer. 
Such data can be used to extract information on features for which no meaningful 
official data exist, such as variation in local income and wealth in countries 
with poor infrastructure and governance (see Jerven 2013; Weidmann and Schutte 
2017). Many sources—including information that was previously classified— can 
now be extracted from digital sources, rapidly disseminated on the internet, and 
advances in text analysis and extraction makes it much easier to conduct systematic 
analysis of such data sources (see, e.g., Biggs and Knauss 2011, Deutschmann 
2016). Simulation can provide an important complement to limited observed data, 
and counterfactual computational analysis can be particularly compelling if it is 
linked to clear theoretical arguments and grounded in known empirical information 
(Cederman 1997; Tetlock and Belkin 1996). It is difficult to predict—especially 
about the future. I make no claim to be able to predict specific new scientific 
innovations or salient new topics with much confidence, but I am very confident that 
new data sources and methodologies for data development will figure prominently 
in a future updated version of a graph of scientific influence in conflict research akin 
to Van Holt et al. (2016). 
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A.1 Appendix: Key Contemporary Data Sources, 
Listed Alphabetically 


Armed Conflict Location & Event Data Project (https://www.acleddata.com/). A 
disaggregated conflict data collection, with dates, actors, types of violence, loca- 
tions, and fatalities of reported political violence and protest events. The ACLED 
data are not global, but cover a number of countries in Africa, Asia, and the Middle 
East. 

Correlates of War Project (http://www.correlatesofwar.org/). Provides access to 
episodic data on interstate wars and militarized interstate disputes. The COW project 
also collects data on various state-based characteristics such as military capabilities 
and diplomatic ties between states. 

Global Database of Events, Language, and Tone (https://www.gdeltproject.org/). 
Provides access to machine coded event data from electronic sources from 1979 to 
the present, using the Conflict and Mediation Event Observations (CAMEO) coding 
scheme. 

Global Terrorism Database (https://www.start.umd.edu/gtd/). Provides access to 
data on terrorist attacks since 1970, as well as some supplementary data sources on 
terrorist group profiles. 

Integrated Crisis Early Warning System (https://dataverse.harvard.edu/dataverse/ 
icews). Daily event data coded from electronic news sources, with actor, event, and 
location identifiers. Note that the most recent public version of the data has a 1 year 
embargo. 

Phoenix [Cline Center Historical Phoenix Event Data] (https://clinecenter. 
illinois.edu/project/machine-generated-event-data-projects/phoenix-data). Event 
data for the period 1945-2015, machine coded from 14 million news stories from 
the New York Times (1945-2005), the BBC Monitoring's Summary of World 
Broadcasts (1979-2015) and the CIA's Foreign Broadcast Information Service 
(1995-2004). 

Phoenix [Real time Phoenix data] (http://eventdata.utdallas.edu/data.html). A 
real time machine coded event dataset complementing the historical data, available 
from October 2017. 

Non-violent and Violent Campaigns and Outcomes (https://www.du.edu/korbel/ 
sie/research/chenow_navco_data.html). Provides access to an influential dataset that 
also documents non-violent mobilization over maximalist claims on a government. 

Social Conflict Analysis Database (https://www.strausscenter.org/scad.html). 
Provides access to event data on protests, riots, strikes, inter-communal conflict, 
government violence against civilians, and other forms of social conflict not 
systematically tracked in other conflict datasets. SCAD currently includes 
information on social conflicts from 1990-2017, covering all of Africa and now 
also Mexico, Central America, and the Caribbean. 

Uppsala Conflict Data Program (https://ucdp.uu.se/downloads/). Provides access 
to data on various types of violent conflicts, including state-based interstate and 
intrastate conflict, violence against civilians, and non-state/inter-communal conflict, 
as well as geo-referenced event data. 
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Abstract Computer-aided text analysis (CATA) offers exciting new possibilities 
for conflict research that this contribution describes using a range of exemplary 
studies from a variety of disciplines including sociology, political science, commu- 
nication studies, and computer science. The chapter synthesizes empirical research 
that investigates conflict in relation to text across different formats and genres. This 
includes both conflict as it is verbalized in the news media, in political speeches, 
and other public documents and conflict as it occurs in online spaces (social media 
platforms, forums) and that is largely confined to such spaces (e.g., flaming and 
trolling). Particular emphasis is placed on research that aims to find commonalities 
between online and offline conflict, and that systematically investigates the dynam- 
ics of group behavior. Both work using inductive computational procedures, such 
as topic modeling, and supervised machine learning approaches are assessed, as 
are more traditional forms of content analysis, such as dictionaries. Finally, cross- 
validation is highlighted as a crucial step in CATA, in order to make the method as 
useful as possible to scholars interested in enlisting text mining for conflict research. 
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1 Introduction 


It is by now an old adage that the internet has transformed many areas of 
social life, from industry and politics to research and education. Computational 
techniques have benefited from this development through the rapid growth in open 
source software and cloud computing, both of which simplify research that utilizes 
computational approaches immensely, making them both simpler and less costly 
for social scientists to implement. However, there has also been a rapid growth in 
the availability of content to study—that is of text, images, and video— which is 
of relevance to social scientific inquiry. Focusing on text, such data range from 
administrative documents and digitized books to social media posts and online 
user comments. They also include traditional research data, such as open survey 
responses and interview transcripts, which may be scrutinized with computational 
techniques. 

The field of computer-aided text analysis (CATA) subsumes methods used to 
study such data. The goal of this chapter is to provide an overview of computational 
methods and techniques related to the area of (semi)automated content analysis 
and text mining, with emphasis on the application of such approaches to conflict 
research. We describe three central areas of CATA in order of their respective age: 
techniques relying on dictionaries and simple word counting, supervised machine 
learning (SML), and unsupervised machine learning (UML). While doing this, we 
provide a survey of published studies from a variety of fields that implement CATA 
techniques to study conflict. We then proceed to address issues of validation, a 
particularly important area of CATA. 

Throughout the chapter, we offer a host of examples of how the application of 
CATA may advance conflict research. Our working definition of conflict in this 
chapter is twofold: we cite studies using CATA to study violent conflict on a regional 
or national level, usually by means of relating textual data that applies to a particular 
actor (for example, a country) to some indicator of violence. Such studies aim to 
uncover hidden relationships between issues, frames, and rhetoric on the one side 
and violent conflict on the other. The second branch of studies that we cite are those 
where conflict is non-violent but there is a considerable aggressive potential, for 
example, in online hate speech campaigns, cyber mobbing, and social media flame 
wars. Such studies are as diverse as their respective objects, but a commonality is 
that because there is usually ample data to document the conflict, CATA may be used 
to draw a precise picture of the actors, issues, and temporal dynamics. By presenting 
both branches of physical and virtual conflict research side by side, we do not imply 
that one follows from the other, but rather that the same approaches may be useful 
in studying both. 

The techniques that we describe can be seen as existing on a continuum, 
from approaches that are more deductive in nature and presuppose very detailed 
domain knowledge and precise research questions/hypotheses, such as dictionary 
analysis, to (more) inductive methods such as unsupervised learning that are more 
suitable for exploration (cf. Table 1). The latter methods also tend to be more 
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Table 1 CATA methods for conflict research, adjusted from (Boumans and Trilling, 2016, p. 10) 


Unsupervised 
Dictionary Supervised (SML) (UML) 
Typical research Sentiment analysis of Relatively Large amounts of 
contexts/material in | documents from homogeneous, unexplored material, 
conflict studies opposing parties or numerous texts, e.g. e.g. from field 
extremist groups, or from newspapers; research, official 
time series of sentiment | sentiment classification | documents, social 
fluctuation of sentences or media 
paragraphs 
Common statistical | Counting of word Support vector (Structural) topic 
procedures frequencies, string machines, Naive Bayes, | modeling, latent 
comparisons Neural Networks Dirichlet allocation 
Reasoning 
P Deductive E Inductive p 


computationally resource-intensive than the former, though this will only really 
be felt when truly large volumes of data are analyzed on a regular desktop or 
laptop computer, and they tend to be more opaque and subject to interpretation 
than simple dictionary techniques which have been in use for decades. However, 
this is not truly a dichotomy as in typical CATA workflows multiple methods are 
often combined in different stages of the research. This can both serve the purpose 
of developing one resource based on the output of another (for example, developing 
a topical dictionary based on the results of unsupervised machine learning) or on 
the validation of a particular technique with another.! 

Similar overviews of CATA for other fields have been provided before, for 
example, in political science, communication studies, and sociology (Boumans and 
Trilling, 2016; Grimmer and Stewart, 2013; DiMaggio, 2015). We aim to extend this 
body of work with an overview of research in conflict research that will be useful to 
computational social scientists aiming to use CATA in their work. 


2 Dictionary Approaches for Conflict Research 


Dictionary methods are among the oldest techniques employed in text mining and 
automated content analysis in the social sciences (Stone et al., 1966) and are popular 
in part due to their simplicity and transparency when compared with more recent 
methods (Grimmer and Stewart, 2013). In fact, dictionary approaches are both 
comparatively easy to interpret and computationally cheap, making them popular 
across a wide range of fields and research subjects. Dictionary approaches rely on 


lFor a comprehensive, hands-on introduction to CATA with Python for social scientists see Trilling 
(2018), for a similar introduction with R (in German) see Puschmann (2018). 
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the frequency of specific words (those contained in the dictionary) to assign each 
document in a corpus to a category. For example, a list of words describing violent 
conflict may be used to operationalize the topic, allowing the researcher to gauge 
the level of debate of this issue over time or by actor, or such a list may be used 
to identify potentially relevant material in a larger corpus.” Specialized topical or 
psychological dictionaries as they are used within the social sciences should not 
be confused with linguistic techniques such as part of speech-tagging, syntactic 
parsing, or named entity recognition (NER), which also allow the reduction of words 
to aggregate categories (nouns, sentence subjects, place names, etc.), but are usually 
intended to describe linguistic form rather than social or communicative function. 

In some implementations, membership in a dictionary category is proportional to 
the number of words occurring in the text that belong to that category, while in others 
a winner-takes-all approach is used in which the document is assigned to the single 
category with the largest number of matching terms. The difference between the two 
styles is the weighting applied to the document feature matrix which contains the 
dictionary terms and the texts in which they occur, which is usually conducted after 
the words are counted (Grimmer and Stewart, 2013, p. 274). 

The increasingly popular method of sentiment analysis—also called opinion 
mining in computer science and computational linguistics—is in many cases a 
simple variant of dictionary analysis in which the dictionary terms belong to one 
of two categories, positive or negative sentiment (although sentiment dictionaries 
with three or more types of sentiment also exist). Sentiment dictionaries exist in 
many variants across languages,? text types, and applications and are often quite 
comprehensive when compared with specialized topical lexicons. In the case of 
binary classification (which applies to many forms of sentiment analysis), the 
logarithm of the ratio of positive to negative words is often used to calculate a 
weighted composite score (Proksch et al., 2019). 

Other dictionaries also exist in a wide variety of shapes and formats, and for a 
large number of different applications (Albaugh et al., 2013; Burden and Sanberg, 
2003; Kellstedt, 2000; Laver and Garry, 2000; Young and Soroka, 2012). These 
include policy areas, moral foundations and justifications, illiberal rhetoric as well 
as place and person names, and other strongly standardized language use. Such off- 
the-shelf dictionaries provide a level of validity by being widely used and (in some 
cases) even being able to assign material in different languages to similar categories 
by having corresponding word lists for each category (Bradley and Lang, 1999; 
Hart, 2000; Pennebaker et al., 2001). Dictionaries can also be created through a 
variety of techniques, including using manually labeled data from which the most 
distinctive terms can be extracted. 

A key strength of dictionary approaches (and of both supervised and unsuper- 
vised learning) is their ability to reduce complexity by turning words into category 


?See Payson Conflict Study Group (2001) for such a list. 


3See, for example, Proksch et al. (2019) for a multilingual sentiment analysis based on automati- 
cally translated dictionaries. 
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distributions. The basis of this approach is what is known as the bag-of-words 
philosophy of text analysis which turns a sequence of words and sentences into 
an undifferentiated "bag" which records only the frequency of each word within 
each text, but no information on where in a text a particular word occurs (Lucas 
et al., 2015, p. 257). Oftentimes this is not a hindrance, as in most quantitative 
research designs scholars will be interested primarily in distilling some aggregate 
meaning from their data, rather than retaining its full complexity. This decision 
entails a number of trade-offs, however, from a loss of structure and meaning that 
occurs when a text is pre-processed and cleaned to the alignment of the dictionary 
categories with the specific meaning of the material under study. The loss of 
syntactic information and argument structure is also an important limitation in bag- 
of-word approaches, which are often used in dictionary analysis (though dictionaries 
of n-grams are both technically possible and in widespread use). 

Dictionaries have long played an important role in conflict research. Baden and 
Tenenboim-Weinblatt (2018) rely on a custom-built cross-linguistic dictionary of 
more than 3700 unique concepts, including actors, places, events, and activities 
which they use to study the media coverage of six current violent conflicts in 
domestic and international media over time. While compiling such a dictionary is 
burdensome, machine translation can be used to turn a mono-linguistic dictionary 
into one covering corresponding concepts across languages. Person and place 
names, specific events, and actions can all be captured by such a dictionary 
with relative accuracy, underlining why such a simple approach can be extremely 
effective (Baden and Tenenboim-Weinblatt, 2018), though translation always needs 
careful validation from experts. A broadly similar approach is used by Brintzenhoff 
(2011) who relies on a proprietary software to identify instances of violent conflict. 
There are also examples of studies that rely on data mining to generate dictionaries 
or resources similar to them. Montiel et al. (2014) present an analysis of the national 
news coverage on the Scarborough Shoal conflict between the Philippines and China 
relying on RapidMiner, a commercial machine learning software suite. A principal 
component analysis differentiates specific issues that are specific to Filipino and 
Chinese news sources from each other. 

Dictionaries are also used to study conflict in virtual environments. Ben-David 
and Matamoros-Fernández (2016) rely on simple word frequencies in their study 
of hate speech on the Facebook pages of extreme-right political parties in Spain. 
After cleaning the data and removing stopwords, they group posts according to 
broad thematic categories and then extract those terms most frequently within 
each group, yielding category descriptions of different groups of immigrants and 
other "enemies." This approach is then combined with an analysis of hyperlinks 
and visual data. Broadly similar, Cohen et al. (2014) suggest identifying specific 
categories of radicalization as they manifest in "lone wolf" terror subjects through 
a combination of ontologies such as WordNet (Miller, 1995) and dictionaries such 
as LIWC (Tausczik and Pennebaker, 2010). While their overview is rather general, 
it points to the potential of composite solutions for linking behavior and language 
use. 
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As Grimmer and Stewart (2013) note, problems occur when dictionaries from 
one area are applied in another domain, leading to potentially serious errors when 
the problem is not caught. The authors cite the example given by Loughran and 
McDonald (2010) in which corporate earnings reports that mention terms such 
as "cancer" or "crude" (oil) are assigned negative sentiment scores, even when 
health care or energy firms mention these terms in an entirely positive context. 
This problem may seem entirely unsurprising, but particular assumptions about 
the nature of language (and in many cases writing) lead to the belief that a 
specialized dictionary that is appropriate in one domain will also produce valid 
results in another. As the example shows, even something as presumably universal as 
sentiment is a case in point: a dictionary that is suitable for capturing the opinion of 
a consumer about a product in a review on a shopping site will not produce equally 
valid results when applied to political speeches or newspaper articles, because (1) in 
them the same words may express different meanings, (2) such texts are presumably 
much more neutral in tone to begin with, (3) such texts do not necessarily express 
the opinion of their author, but institutional viewpoints, and (4) such texts report on 
or respond to the opinions of others. 

Dictionaries should always be validated using the data to which the dictionary 
is to be applied, in other words it should not be presumed that the dictionary will 
produce accurate results if it is applied to a domain that is in any way different 
from the one for which it was developed. This applies equally to off-the-shelf and 
self-made dictionaries. Systematically validating dictionary results, for example, by 
means of traditional content analysis, is one common pathway to overcoming these 
problems. 


3 Supervised Methods 


Supervised machine learning (SML) represents a significant step away from the 
useful but also quite limited methods described in the previous section, towards 
more advanced techniques that draw on innovations made in the fields of computer 
science and computational linguistics over the past 30 years. This does not mean 
that such techniques are generally superior to dictionary approaches or other 
methods that rely on word counting, but that they utilize the extremely patterned 
nature of word distributions. In particular, supervised machine learning is able 
to connect feature distribution patterns with human judgment by letting human 
coders categorize textual material (sentences, paragraphs, or short texts) according 
to specific inferential criteria and then asking an algorithm to make a prediction 
of the category of a given piece of text based on its features. Once a classifier 
has been trained to a satisfactory level of accuracy, it can be used to classify 
unknown material. The algorithm thus learns from human decisions, allowing for 
the identification of patterns that humans are able to discern, but that are otherwise 
not obvious with methods relying purely on words and word distribution patterns. 


Text as Data for Conflict Research: A Literature Survey 49 


The perhaps most typical research design consists of a set of labeled texts 
(alternatively paragraphs, sentences, social media messages, rarely more complex 
syntactic structures) from which it is possible to derive feature distributions, typi- 
cally words (alternatively n-grams, part of speech information, syntactic structures, 
emojis). First, the data is split into a training and a test data set. An algorithm then 
learns the relation of the label to the distribution of the features from the training 
data set and then applies what has been learned to the test data set. This produces a 
set of metrics which allow to evaluate the classifier's performance. If the quality of 
the automated coding is deemed as satisfactory (i.e., similar to or better than human 
annotation) in terms of its precision and recall, the classifier can be applied to new, 
previously uncoded material. There are three major uses to this basic technique, 
including the validation of a traditional content analysis, the automated annotation 
of unknown material, and the discovery of structural relationships between external 
variables that prove to be reliable predictors for language use (Puschmann, 2018). 

The applications of SML to conflict research and to social science more 
broadly are manifold. In a traditional content analysis, achieving a high inter-coder 
reliability is usually a key aim, because it signals that a high degree of inter- 
subjectivity is feasible when multiple humans judge the same text by a previously 
agreed set of criteria. In this approach, the machine leaning algorithm in effect 
becomes an additional “algorithmic coder" (Zamith and Lewis, 2015) that can be 
evaluated along similar lines as a human would be. Crucially, in such an approach 
the algorithm aims to predict the—presumably perfect—consensus judgment of 
human coders that is treated as “ground truth.” Social scientists who rely on content 
analysis know that content categories are virtually never entirely uncontroversial. 
Since obviously humans disagree with one another, there is a risk of “garbage in, 
garbage out” when training the classifier on badly annotated material. Thus, the 
quality of the annotation and the linguistic closeness of the relation between content 
and code is the key, and the notion of “ground truth” should be treated with care. 

This is usually not an issue when what is being predicted is the topic or theme 
of a text. For example, Scharkow (2013) relies on SML to gauge the reliability of 
machine classification in direct comparison to human coders, comparing the topics 
assigned to 933 articles from a range of German news sources. He finds automated 
classification to yield very good results for certain categories (e.g., sports) and 
poor results for others (e.g., controversy and crime), with implications for conflict 
research. As the author points out, even for categories where the classification 
results are less reliable, the application of SML yields important findings on the 
quality of manual content analyses. Similarly, van Atteveldt et al. (2008) are able 
to predict different attributes and concepts in a manually annotated corpus of Dutch 
newspaper texts using a range of lexical and syntactic features for their prediction. 
In both bases, the SML approach yields good results because the annotation is of 
high quality and the categories that are being predicted are strongly content-bound, 
rather than interpretative. 

While frequently the categories coded for are determined through content 
analysis and relatively closely bound to the text itself (themes, issues, frames, 
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arguments), or can be related to social or legal norms (e.g., hate speech), it is worth 
noting that any relevant metadata may be used as the label that the classifier aims 
to make a prediction on. For example, Kananovich (2018) trains a classifier on a 
manually labeled data set of frames in international news reports that mention taxes, 
and tests two hypotheses related to the prevalence of certain frames in countries with 
particular political systems. 

Burscher and colleagues have shown that supervised machine learning can be 
used to code frames in Dutch news articles and reliably discern policy issues 
(Burscher et al., 2014, 2015). Sentiment analysis using SML has also been applied, 
with results considerably better than those of approaches that are purely based on 
the application of lexicons (González-Bailón and Paltoglou, 2015). 

Burnap and Williams (2015) train a sophisticated supervised machine learning 
text classifier that distinguishes between hateful and/or antagonistic responses with 
a focus on race, ethnicity, or religion; and more general responses. Classification 
features were derived from the content of each tweet, including grammatical depen- 
dencies between words to recognize “othering” phrases, incitement to respond with 
antagonistic action, and claims of well-founded or justified discrimination against 
social groups. The results of the classifier draw on a combination of probabilistic, 
rule-based, and spatial classifiers with a voted ensemble meta-classifier. 

Social media data can also be productively combined with demographic and geo- 
spatial data to make predictions on issues such as political leanings. For example, 
Bastos and Mercea (2018) fit a model that is able to predict support for the 
Brexit referendum in the UK based on the combination of geo-localized tweets and 
sociodemographic data. 

Though manual classification is the norm, in some cases, a combination of 
unsupervised and supervised machine learning may yield good results. Boecking 
et al. (2015) study domestic events in Egypt over a 4-year period, effectively using 
the metadata and background knowledge of events from 1.3 million tweets to train 
a classifier. 

Other approaches that connect manual content analysis with supervised machine 
learning that are presently still underutilized in the social sciences include argumen- 
tation mining. For example, Bosc et al. (2016) provide an overview of argument 
identification and classification using a number of different classifiers applied to a 
range of manually annotated Twitter data sets. Using a broader range of features in 
particular appears to increase the performance of SML techniques markedly. 


4 Topic Modeling as Unsupervised Method 
in Conflict Research 


The main difference between supervised and unsupervised text as data methods is 
that unsupervised techniques do not require a conceptual structure that has been 
defined beforehand. As explained above, dictionary applications and supervised 
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techniques are deductive approaches which rely either on a theoretically informed 
collection of key terms or a manually coded sample of documents to specify what 
is conceptually interesting about the material before applying a statistical model to 
extend the insights to a larger population of texts. In contrast to this, unsupervised 
methods work inductively: without predefined classification schemes and by using 
relatively few modeling assumptions, such algorithm-based techniques shift human 
efforts to the end of the analysis and help researchers to discover latent features of 
the texts (Lucas et al., 2015, p. 260, Grimmer and Stewart, 2013). 

Unsupervised text as data techniques are useful for conflict research—especially 
for understudied areas and previously unknown primary sources or the many 
rapidly growing digitized resources—because they have the potential to disclose 
underlying clusters and structures in large amounts of texts. Such new insights can 
either complement and refine existing theories or contribute to new theory-building 
processes about the causes and consequences of conflict. 

While there are several variations of unsupervised methods, ^ our literature survey 
shows that topic modeling is the most frequently used technique in conflict research. 
Common to topic modeling is that topics are defined as probability distributions 
over words and that each document in a corpus is seen as a mixture of these topics 
(Chang et al., 2009; Grimmer and Stewart, 2013; Roberts et al., 2014). The first and 
still widely applied topic model is the so-called LDA—latent Dirichlet allocation 
(Blei et al., 2003; Grimmer and Stewart, 2013). Recently, the Structural Topic Model 
(STM) has been proposed as an innovative and increasingly used alternative to the 
LDA (Roberts et al., 2014; Lucas et al., 2015; Roberts et al., 2016). Whereas the 
LDA algorithm assumes that topical prevalence (the frequency with which a topic 
is discussed) and topical content (the words used to discuss a topic) are constant 
across all documents, the STM allows to incorporate covariates into the algorithm 
which can illustrate potential variation in this regard (Roberts et al., 2014, p. 4). 

Typically, the workflow? of topic modeling starts with a thorough cleaning 
of the text corpus, as commonly done for quantitative bag-of-words analyses 
which transform texts into data. Depending on the research focus, such automated 
preprocessing includes lowercasing of all letters, erasing of uninformative non- 
letter characters and numbers, stopword removal, stemming, and possibly also the 
removal of infrequently used terms. Text cleaning procedures can have significant 
and unexpected effects on the results of unsupervised analyses which is why Denny 
and Spirling (2018) recommend “reasonable” preprocessing decisions and suggest 
a new technique to test their potential effects. Subsequently, researchers must 
make some model specifications such as determining the number of topics (K) to 


^Grimmer and Stewart (2013) provide a useful overview in this regard. 


5While some researchers perform topic modeling in Python (2018), the detailed vignettes as well 
as online support and tutorials of packages in R (2018) such as quanteda (Benoit et al., 2018) or 
stm (Roberts et al., 2018) make these tools easily accessible. 


®For detailed explanations of how to apply these tests see Denny and Spirling (2018). 


52 S. F. Maerz and C. Puschmann 


be inferred from the corpus and—in case of the STM—the choice of covariates. 
Through Bayesian learning, the model then discriminates between the different 
topics in each document. Concretely this means, for example, that based on updated 
word probabilities, the algorithm would group terms such as “god,” “faith,” “holy,” 
"spiritual," and “church” to one topic in a document, while the same document 
could also contain words such as “bloody,” “violent,” “death,” “crime,” and “victim” 
constituting a second topic. Lastly, it is the researchers' task to adequately label and 
interpret such topics and make more general inferences. 

Topic modeling is a new methodological trend in conflict research—the recent 
growth in studies which apply such methods point to the great potential these 
innovative approaches have in this area. Examples cover a broad range of issues: 
Stewart and Zhukov (2009), for instance, analyze nearly 8000 public statements by 
political and military elites in Russia between 1998 and 2008 to assess the country’s 
public debate over the use of force as an instrument of foreign and defense policy. 
The LDA analysis of Bonilla and Grimmer (2013) focuses rather on how external 
threats of using force and committing a terrorist attack influence the themes of major 
US media and the public’s policy preferences at large. Other studies applying the 
LDA algorithm scrutinize patterns of speaking about Muslims and Islam in a large 
Swedish Internet forum (Tórnberg and Tórnberg, 2016) or generally look into how 
controversial topics such as nuclear technology are discussed in journalistic texts 
(Jacobi et al., 2016). While Fawcett et al. (2018) analyze the dynamics in the heated 
public debate on “fracking” in Australia as another example of non-violent conflict, 
Miller (2013) shows that topic modeling can be also valuable to study historical 
primary sources on violent crimes and unrest in Qing China (1722-1911). 

One central and rather broad contribution to conflict research is the study of 
Mueller and Rauh (2018). Based on LDA topic modeling, they propose a new 
methodology to predict the timing of armed conflict by systematically analyzing 
changing themes in large amounts of English-speaking newspaper texts (articles 
from 1975 to 2015, reporting on 185 countries). The added value of using unsu- 
pervised text-mining techniques here is that the explored within-country variation 
of topics over time help to understand when a country is at risk to experience 
violent outbreaks, independent of whether the country had experienced conflicts 
in the past. This is truly innovative because earlier studies could merely predict a 
general, not time-specific risk in only those countries where conflict had appeared 
before. Mueller and Rauh (2018, p. 359) combine their unsupervised model with 
panel regressions to illustrate that (not) reporting on particular topics increases the 
likelihood of an upcoming conflict. They show, for example, that the reference to 
judicial procedures significantly decreases before conflicts arise. 

Other recent conflict analyses apply the newly proposed STM model of Roberts 
et al. (2014, 2016, 2018). As explained above, the difference between the LDA and 
STM algorithm is that the latter allows to include document-level metadata. Lucas 
et al. (2015), for example, specify in their model on Islamic fatwas whether clerics 
are Jihadists or not. Based on this, they illustrate crucial topical differences between 
both groups—thus, Jihadists mostly talk about “Fighting” and “Excommunication” 
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while non-Jihadists rather use topics such as “Prayer” and “Ramadan” (Lucas et al., 
2015, p. 265). Terman (2017) uses STM to scrutinize Islamophobia and portrayals 
of Muslim women in US news media. Her findings of analyzing a 35-year coverage 
of journalistic texts in the New York Times and Washington Post (1980—2014) 
on women in non-US countries show that stories about Muslim women mostly 
address the violation of women's rights and gender inequality while stories about 
non-Muslim women emphasize other topics. Further research on conflicts which 
also make use of STM include Bagozzi and Berliner's (2018) analysis of crucial 
variations over time concerning topic preferences in human rights monitoring or 
Mishler et al. (2015) test of detecting events based on systematically analyzing 
Ukrainian and Russian social media. 

Validating model specifications and particularly the labeling and interpretation 
of topics as model output is an absolutely crucial part of any unsupervised text 
analysis. As Grimmer and Stewart (2013) point out, such post-fit validation can be 
extensive. However, systematic validation procedures and standardized robustness 
tests for unsupervised methods are still pending. Frequently, applications of topic 
models in conflict research and other fields of study exhibit two shortcomings in 
this regard: First, the model specification of determining the number of topics (K) 
is not sufficiently justified. Second, the labeling and interpretation of topics seem 
arbitrary due to lack of information about this process. 

The selection of an appropriate number of topics (K) is an important moment in 
topic modeling: too few topics result in overly broad and unspecific categories, too 
many topics tend to over-cluster the corpus in marginal and highly similar topics 
(Greene et al., 2014, p. 81). The general aim in this regard is to find the number of 
K that yields the most interpretable topics. While there are methods and algorithms 
to automatically select the number of topics (Lee and Mimno, 2014; Roberts et al., 
2018), Chang et al. (2009) show that the statistically best-fitting model is usually not 
the model which provides substantively relevant and interpretable topics. To reach 
this goal, we recommend to conduct systematic comparisons of model outcomes 
with different Ks, similar to Bagozzi and Berliner (2018), Jacobi et al. (2016), 
Mueller and Rauh (2018), Lucas et al. (2015). Visualizations of such robustness tests 
such as in Maerz and Schneider (2019) further increase the transparency concerning 
the decision-making process of determining K. 

A valid process of labeling and interpreting topics as model outcome includes 
a thorough analysis of the word profiles for each topic. While computational 
tools can efficiently support such examinations, one should keep in mind that 
this is a genuinely interpretative and rather time-consuming act which needs to 
be documented in a comprehensible manner. The R package stm offers several 
functions to visualize and better understand the discursive contexts of topics 
(Roberts et al., 2018). This includes the compilation of detailed word lists with 
most frequent and/or exclusive terms per topic (lableTopics), the qualitative check 
of most typical texts for each topic (findThoughts), or estimating the relationship 
between metadata and topics to better understand the context and interrelation of 
the topics at large (estimateEffect). In addition, Schwemmer's (2018) application 
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stminsights provides interactive visualization tools for STM outcomes to facilitate a 
straightforward validation. In the following section, we make several suggestions 
of how to further strengthening the validity of automated content analysis in 
conflict research by combining topic modeling with other text-mining techniques 
and quantitative or qualitative methods. 


5 Techniques of Cross-Validation 


In their groundbreaking article on automated content analysis of political texts, 
Grimmer and Stewart (2013, p. 269) suggest four principles of this method: (7) 
While all quantitative models of language are wrong, some are indeed useful. (2) 
Automated text analysis can augment, but not replace thorough reading of texts. 
(3) There is no universally best method for quantitative text analysis. (4) Validate, 
validate, validate. It is particularly the latter point which we would like to emphasize 
in this section. Automated text analysis has the potential to significantly reduce the 
costs and time needed for analyzing a large amount of texts in conflict research— 
yet, such methods should never be used blindly and without meticulous validation 
procedures that illustrate the credibility of the output. 

As we have argued above, the validation of dictionary approaches and supervised 
techniques needs to show that such methods can replicate human coding in a 
reliable manner (Grimmer and Stewart, 2013, p. 270). For unsupervised methods, 
it is important to justify and explain model specifications and demonstrate that 
the model output is conceptually meaningful. Beside these necessary steps for 
each method individually, we recommend to combine dictionary approaches and 
supervised as well as unsupervised techniques as efficient tools for cross-validation. 
In agreement with Grimmer and Stewart (2013, p. 281) we hold that these different 
techniques are highly complementary and suggest two strategies of designing such 
multi-method validations. The first procedure of cross-validation is rather inductive 
and particularly suitable for exploring new theoretical relations and conceptual 
structures in large amounts of hitherto broadly unknown texts. This technique 
is similar to what Nelson (2017) describes as “computational grounded theory.” 
Figure 1 provides a simplified illustration of this process, which we refer to as 
the inductive cycle of cross-validation. The starting point of this framework is 
topic modeling because it allows for an inductive computational exploration of 
the texts. Nelson (2017) calls this the pattern detection step, which subsequently 
facilitates the formulation of new theories. Based on this theory-building process, a 
targeted dictionary or coding scheme is conceptualized. The outcome of applying 
this newly developed dictionary or coding scheme can illustrate that the results of the 
preceding topic modeling are indeed conceptually valid and—to a certain degree— 
comparable to measures from supervised models (Grimmer and Stewart, 2013, 
p. 271). Furthermore, such supplementary supervised analyses are more focused 
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Fig. 1 The inductive cycle of cross-validation 


and helped to illuminate specific aspects of the texts which are theoretically more 
interesting than the broad outcome of the explorative topic modeling. 

The rich and original material gained during ethnographic field research is one 
example from conflict studies for which the inductive cycle would be a suitable 
approach. After having conducted open-ended surveys in a country torn by ethnic 
conflicts, for instance, one is confronted with huge amounts of unique texts which 
ask to be analyzed. Topic modeling is a fruitful start in this regard (Roberts 
et al., 2014), followed by a more fine-graded and theory-guided dictionary analysis 
or supervised learning. Overall, the suggested framework allows for a thorough 
cross-validation of the different analytic steps and is a comprehensive way of 
computationally accessing new information—in this example about the nature of 
ethnic conflicts. 

The second procedure of cross-validation is a deductive approach that implies 
that the researcher has an existing theoretical framework in mind when developing 
a dictionary or coding scheme for supervised learning. Alternatively, one could also 
apply an already established dictionary to a corpus of texts for which this application 
is theoretically and substantially justified (yet, see Sect.2 regarding the risks of 
blindly adopting dictionaries for diverging fields of inquiry). As illustrated in Fig. 2, 
this first step is followed by a topic model applied to the same corpus of texts to 
additionally explore hidden features in the material that might be of theoretical 
interest but are not yet captured by the dictionary or coding scheme. The outcome of 
the topic modeling—typically a report of top terms appearing in K topics—has then 
the potential to validate but also significantly complement and refine the existing 
dictionary or coding scheme, leading to more solid results. 
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Fig. 2 The deductive cycle of cross-validation 


The analysis of propaganda magazines or online material published by a newly 
emerging Islamist terrorist group is one example from conflict research that 
could be adequately analyzed with the described deductive framework. Making 
use of existing theories about Islamist communication strategies or applying an 
already established dictionary that was developed to analyze Islamist rhetoric seems 
adequate to scrutinize the content of such texts in a first step. However, since 
the assumed terrorist group would be a new formation in the field of Islamist 
fundamentalism, the additional application of topic modeling could disclose so far 
unknown aspects about this group or the language of terrorists in general. This, in 
turn, contributes to further improving the existing dictionary or coding scheme and, 
overall, enables a more valid analysis. 

Existing empirical analyses from related fields of research that apply a similar 
validation cycle include the study of the language of autocrats by Maerz (2019) 
or the analysis of illiberalness in the speeches of political leaders by Maerz 
and Schneider (2019). The latter further expand the validity tests to qualitative 
checks and network analysis to handle their particularly heterogeneous material. 
While we have focused here solely on a fruitful combination of various text as 
data techniques, the inclusion of other qualitative and quantitative methods and 
visualization techniques is another option to further test and illustrate the validity 
of the results. 
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6 Conclusion 


In this chapter, we discussed several CATA methods used in conflict research as new 
techniques to handle growing amounts of written on- and offline resources. Table 2 
compares the performance of the different approaches which we have described 
in the preceding sections. The first technique—dictionary applications—is rather 
straightforward and comparatively easy to apply once a theory-guided selection 
of keywords has been defined. For conflict researchers interested in text mining 
methods, this first approach might be particularly suitable if material was collected 
from a research field that is already widely covered by established theories. The 
dictionary analysis could help, for example, to further refine those theories. Yet, 
as Table 2 specifies, one disadvantage of dictionary applications is that it can 
be very challenging to justify why a certain selection of terms is more suitable 
than alternative word lists. Such procedures typically imply extensive qualitative 
procedures to illustrate the validity of the dictionary. 

The second approach we discussed is supervised machine learning. Supervised 
text mining is a more sophisticated approach than dictionary applications because 
it is not limited to a fixed list of keywords. Instead, these semi-automated methods 
make use of algorithms which learn how to apply the categories of a manually coded 
training set to larger amounts of texts. One downside of supervised learning is that 
the manual coding of the training set can be highly work-intensive. This is why we 
recommend this method for conflict researchers who are either experienced in the 
manual coding of texts or have sufficient capacities to handle this first and laborious 
step of the analysis. 

Lastly, we reviewed topic modeling as the most current unsupervised method 
applied in conflict research. Topic modeling is particularly suitable for sizable 
amounts of new texts that cannot be manually screened since these methods help 
to explore the underlying structure and topics of the hitherto unknown texts. 
While this inductive detection of topics is fully automated, the definition of model 
specifications and interpretation of the model outcome require high human efforts 
and transparency to ensure valid and non-arbitrary inferences (cf. Table 2). 


Table 2 Comparing the performance of different CATA methods 


Dictionary Supervised (SML) Unsupervised (UML) 
Advantages Relatively easy to apply, no| Not limited to fixed Reveals latent features 
comprehensive cleaning of | keywords, extends in (hitherto unknown) 
corpus needed manually coded texts, analysis is fully 
training set to large automated 
amounts of texts 
Disadvantages | Validation of dictionary Laborious manual Model specifications 
challenging, re-usage of coding of training set | and interpretation of 
dictionary in different required model outcome requires 


contexts problematic high human efforts 
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As one recommendation for future text mining projects in conflict research, we 
highlighted validation as a crucial element of all text as data methods. Ideally, tests 
for validity evaluate model performance and compare the output of the model to 
the results of hand coding, illustrating that the automated analysis closely replicates 
the human-coded outcome. However, applying such procedures can be costly and 
difficult to implement in many settings. This is why we additionally suggested two 
cycles of combining dictionary approaches, supervised methods and unsupervised 
techniques to effectively cross-validate the outcome of these applications. 

Apart from extensive validation procedures, we believe that transparency in terms 
of methodological decisions and steps, accessibility to data and replication files as 
well as open access publications are critical to advance computational methods 
in conflict research and beyond. While researchers have started to follow these 
practices in providing online appendices on methodological details and robustness 
tests and making their replication files publicly available on dataverses,’ there is 
still a large number of studies that are rather nebulous about these things, further 
enforcing the much-discussed replication crisis in the social sciences. Text as data 
approaches are currently experiencing a hype—yet, while plenty of innovative 
tools and techniques are being developed, there is the need for platforms and 
digital hubs that bundle the newly gained knowledge and make it accessible to a 
broader community of researchers.? Such new policies of data sharing and digital 
cooperation pave the way for a more networked and progressive computational 
methodology in the social sciences. 


Appendix 


See Table 3. 


7For example, https://dataverse.harvard.edu/. 

*First steps into this direction are initiated by research institutes such as the Social Media and 
Political Participation Lab (https://smappnyu.org/), the MediaCloud (https://mediacloud.org/), the 
Berkman Klein Center for Internet and Society at Harvard (https://cyber.harvard.edu/), the Oxford 
Internet Institute (https://www.oii.ox.ac.uk/) or the newly founded Digital Democracy Lab (https:// 
digdemlab.github.io/). 
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Abstract Relational event models are a powerful tool to examine how conflicts 
arise or manifest through human interactions and how they evolve over time. 
Building on event history analysis, these models combine network dependencies 
with temporal dynamics and allow for the analysis of group formation patterns— 
such as alliance or coalition formation processes—influencing dynamics or social 
learning. The added information on both the timing (and order) of social interactions 
as well as the context in which social interactions take place (i.e., the broader 
network in which people or actors are embedded in) can give powerful new evidence 
to theorized social mechanisms. This chapter provides an overview of REMs and 
showcases two empirical studies to illustrate the approach. The first study examines 
political alliance-formation patterns among countries engaging in military actions 
in the Gulf region. The REM shows that countries engage in military actions with 
other countries by balancing their relations, i.e., by supporting allies of their allies 
and opposing enemies of their allies. The second study shows that party family 
homophily guides parliamentary veto decisions and provides empirical evidence of 
social influencing dynamics among European parliaments. 
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68 L. Brandenberger 
1 Introduction 


Conflict has an inherently social aspect. Conflicts often arise between two parties 
and are often perpetrated in a broader context with the involvement of third-party 
actors! (see for instance Nelson, 1989; Crescenzi, 2003; Knoke, 1994; Wasserman 
and Galaskiewicz, 1994). One potential source of conflict relates to the social 
mechanism of social influencing and the social dynamics that build from it. Social 
influencing can be described as a relational process where actors modify their 
behavior or values to become more alike with the actors they interact with (for an 
overview, see Flache et al., 2017). Influencing has been described as an active force, 
where some actors try to persuade others to change their beliefs, attitudes, or even 
behavior—or as a passive force, where actors mimic values or behavior of others 
with whom they interact (Lindstádt et al., 2017; Shalizi and Thomas, 2011). 

Social influencing can lead to conflicts within groups, as it brings some actors to 
do things they may not necessarily want to do (Myers, 1982; Welch and Wilkinson, 
2005). Furthermore, social influencing can lead to coalition formation, where groups 
of actors develop their own dynamics and engage more strongly with their own 
group members than with other actors outside their group (Jehn et al., 2013; Berardo 
and Scholz, 2010). This can lead to situations where if one member of a group 
stands in conflict with another actor outside the group, the entire group may develop 
a negative relation with this outside actor. By doing so, the group reinforces their 
own group cohesion. Heider (1946) summarizes these coalition formation dynamics 
in his balance theory, where he stipulates that the enemy of my friend eventually 
becomes my enemy as well (see also Newcomb (1961) and Kohne et al. in the 
chapter "Norm Conflict in Social Networks" of this book). This indicates that 
conflict dynamics can go beyond dyadic relationships and a conflict between two 
actors can escalate into a conflict between larger groups or coalitions (Hadjikhani 
and Hákansson, 1996; Crano and Cooper, 1973; Labianca et al., 1998). 

A question that naturally arises is: How can we detect and examine these social 
dynamics that can lead to conflicts in social interactions? Relational event models 
(REM) can be used to study multiple social mechanisms and their explanatory 
power of the temporal dynamics behind social interactions. REMs are inferential 
models that make use of temporally fine-grained records of social interactions to 
model complex interaction patterns and endogenous processes. REMs can be used to 
detect social influencing (Malang et al., 2018), understand social exchanges (Butts, 
2008; Zenk and Stadtfeld, 2010; Quintane et al., 2014; Kitts et al., 2016; Stadtfeld 
and Geyer-Schulz, 2011), and determine causes for group or conflict formation 
processes (Lerner et al., 2013a; Leifeld and Brandenberger, 2019; De Nooy and 
Kleinnijenhuis, 2013). Building on event history analysis, REMs try to explain 
the occurrence of relational events. The use of the network approach allows 
REMs to detect complex patterns in these relational events that go beyond dyadic 


lFor sake of linguistic simplicity, this chapter refers to actors as a general term for different social 
entities, such as individuals, organizations, governments, groups, teams or other collective actors. 
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dependencies (i.e., go beyond direct person-to-person interactions to include, for 
instance, the effect of third parties in these patterns) (Butts, 2008). 

This chapter provides an overview over relational event models for the analysis 
of conflict event data. First, relational events as records of social interactions are 
discussed. Afterwards, REMs are presented, including how they build on event 
history analysis to statistically model event occurrence. The heart of REMs are 
endogenous network statistics that operationalize social mechanisms or patterns. 
The most commonly used statistics are presented in Sect.4, together with a 
discussion of the temporal aspects of REMs. Section 5 gives two empirical examples 
and discusses their operationalizations of alliance formation and social influencing. 
The chapter closes with a discussion of the limitations of REMs and their link 
to agent-based modeling through the shared use of operationalizations of social 
mechanisms of human interactions. 


2 Relational Events 


Conflict events often entail both a relational and directional aspect. Relational in 
the sense that these events report interactions among individuals, groups, or actors. 
These interactions are often directed from one party to another and signed. They 
can be negative or openly conflictive in nature and reflect opposition between two 
engaging parties, for instance through an act of aggression from one party directed 
at another party. However, they can also be positive in nature and reflect support, 
for instance through the exchange of information or resources. In the latter case, the 
absence of positive interactions may be an indication of potential conflicts among 
actors not sharing resources. 

Alternatively, conflicts can also arise through surrounding issues and be recorded 
in indirect social interactions, where an active actor engages in passive issues or 
events. By looking at the surrounding involvement of other actors in these issues 
or events, a complex entanglement of actors becomes evident, where conflicts 
manifest themselves for instance in coalition structures and close-knit clusters of 
actors engaging in the same issues or events. A political debate can serve as an 
example here, where political actors take stances on different political issues, thus 
revealing their underlying coalition structure and support system (see for instance 
Leifeld (2017) and Hadjdinjak et al. in the chapter Migration Policy Framing" 
of this book). Relational event models aim at uncovering patterns that guide these 
interactions and help explain how conflicts arise or manifest themselves in social 
interactions and how they evolve over time. 

At the minimum, relational events consist of a sender node a, a target node b, 
and either a time stamp f that records the interaction in continuous time or the place 
of the event in the time-ordered sequence. Once sorted in time, these events form 
a so-called event sequence—or event stream. Relational events can be expanded 
to reflect more diverse interactions. Events can be signed, for instance, to classify 
allegiance and opposition in international relations, friendship, and animosities 
in interpersonal interactions or agreement and disagreement in communication 
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networks. Additionally, relational events can be weighted to reflect the intensity of 
the interaction. For instance, in international relations, weighted events can signify 
the degree of military aggression that an event encodes. In an event sequence 
consisting of email exchanges between colleagues in a firm, the weight of each 
event could correspond to the number of characters in the email or the degree 
of friendliness in the tone of the email. Sometimes, social interactions cannot 
be weighted but allow categorization. A relational event sequence can consist of 
different types of social interactions among sender and target nodes. For instance, 
in legislative politics, an event sequence can consist of members of parliament 
referencing (or attacking) each other in speeches, supporting each others' legislative 
proposal by cosponsoring them, or organizing joint press events to discuss relevant 
topics with the public. The assumption guiding these different types of interactions 
is that they co-evolve and affect each other over time. 

In sum, relational event sequences are relatively flexible and are generally 
constrained by the data-gathering process and the degree to which social interactions 
can be quantified in a meaningful way. 


3 Relational Event Models 


The goal of REMs is to explain the temporal order of social interactions. Why do two 
people suddenly start exchanging emails? Why do two governments start engaging 
in military conflicts? Why do two members of parliament start collaborating 
with each other on a new legislative proposal? The answer to these questions is 
sometimes found in the broader context of these events. If two countries take up 
arms against each other, the alliances that form beforehand play a key role. If two 
people start exchanging emails it is possible that a mutual friend introduced them 
to each other beforehand. And if two members of parliament start working on a 
mutual proposal it is possible that they both learned about their mutual interest by 
both opposing a proposal by another member. 

The events that occurred in the past often guide subsequent events and the 
relational event point of view can help uncover not simply how previous interactions 
of the two involved people or actors a and b shape their future interactions but also 
how changes in their surrounding network (i.e., with events that do not even involve 
a or b) affect how a and b interact in the future. It is a powerful framework for time- 
stamped or time-ordered analysis of social interactions that takes the surrounding 
context or a person's or actor's embeddedness into account. And so REMs build on 
event history analysis to ask the question: why does one event occur at time t and 
has not occurred before? And in a broader sense: which patterns of past interactions 
can help explain a specific sequence of events? 

Figure 1 depicts a simple event sequence involving four actors (=nodes). The 
first three recorded interactions represent support among nodes a, b, and c. After a 
longer break, a new node d initiates a negative interaction with b, prompting b to 
affirm their positive relationship with node a which then brings a to oppose d. This 
example already hints at the complex surroundings actors are embedded in because 
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— —» support 
— opposition 


Fig. 1 Illustration of a relational event sequence depicting positive and negative interactions 
among four nodes a, b, c, and d 
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Fig. 2 Counting process data setup to estimate relational event models for the event sequence 
presented in Fig. 1. Each event in the event sequence forms a true event and is compared against 
events that could have occurred at time ¢ but did not (so-called null events). The simplest definition 
of the risk set for each stratum (as shown here) contains events which eventually will take place 
but so far have not 


even though actor d does not directly attack actor a, a may react to an indirect threat 
that occurs when d attacks b. The figure also shows the additional information that 
can be gained by recording events in time, as the time between events holds valuable 
information on how strongly future events depend on past events. The assumption 
that a reacts to an indirect threat by d is dampened a little bit by the long time it 
takes a to oppose d (4 time units in Fig. 1). The additional information on the timing 
of events can be used when encoding patterns, further discussed in Sect. 4. 

In order to analyze the event sequence in Fig.l, the sequence has to be 
transformed into so-called counting process data, first introduced by Andersen and 
Gill (1982). Figure 2 shows the setup of the counting process data for the event 
sequence presented in Fig. 1. For each unique time point in the event sequence, 
a stratum—or risk set—is build, containing both the true event (i.e., the event or 
events that occurred at time f) and null events. Null events are events that could 
potentially have occurred at time t but did not. The simplest definition of a risk set 
D at time f contains the true events that occurred at time f as well as all events that 
occur after time f. As the null events can add considerable observations to the data 
set, this definition of the risk set (restricted to events that occurred at one point in 
the event sequence) is the most sparse definition. Alternatively, if the event sequence 
allows for repeated events (i.e., interactions between two nodes can occur multiple 
times), a broader definition of the risk set may be desirable. For instance, the risk 
set could simply contain all possible combinations of sender and target nodes. In 
case the event sequence is signed and thus records both positive (e.g., supporting) 
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and negative (e.g., hostile) interactions among nodes, the risk set can contain all 
combinations of sender and target nodes with both positive and negative edges. In 
the case of the event sequence presented in Fig. 1, the maximally large risk set could 
contain 24 events (4-3-2 = 24; 4 sender nodes times 3 remaining sender nodes 
(omitting self-loops) times 2 because each edge can be supporting or opposing) (see 
Brandenberger (2018) for additional information of risk set compositions). 

Once the null events are added to the data, the hazard of event occurrence 
can be estimated. Standard inferential models from event history analysis can be 
employed because REMs assume that events are conditionally independent of one 
another if both exogenous and endogenous covariates are controlled for (Butts, 
2008; Lerner et al., 2013b). The simplest form of the REM models event occurrence 
as a piecewise constant hazard model. This model assumes that the hazard (or 
chance) of an event occurring is constant within a time interval. 

The likelihood that a specific number of events nj; (1) take place on a dyad (i, j) 
within the time interval t is given by the hazard rate A;;(r), and then multiplied 
by the survival function exp(—2;j(r)), which captures all events that could have 
occurred at time £ yet did not (see Lerner et al. 2013a, pp. 18-19 and Butts 2008, 
pp. 161-163): 


Aij ("IO - exp(-Aij (0)) 


Pr(nij(t)) — e 
1J z 


(1) 


The probability density of the event sequence E can be gained by multiplying all 
dyads and all unique times f, to ty: 
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AE =T]t II Os 


t=t ij €Dact(t) 
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where Dact(t) refers to all dyads in which at least one event occurred over E and D 
refers to all possible events that could have occurred (Lerner et al., 2013a, pp. 18- 
19). 

For continuous-time sequences, REMs use duration models or a stratified Cox 
model to model the time-to-next-event (for an example, see Brandenberger, 2018). 
In Fig. 1 these inter-event times are summarized below the curly brackets. If the 
exact timing of events is irrelevant or only discrete-time information is available, 
a stratified Cox model with constant event times can be used (Butts, 2008). The 
stratified Cox model estimates which factors affect event occurrence, i.e., cause 
an event to occur during one particular strata at time f, and assumes that the 
baseline hazard of each event is constant within a stratum but varies between strata 
(Cox and Oakes, 1984; Allison, 1982; Box-Steffensmeier and Jones, 2004; Allison, 
2014). The stratified Cox model with constant event times can be estimated with a 
conditional logistic regression (Gail et al., 1980; Allison, 1982) and has become the 
most widely used model for REMs (Kitts et al., 2016; Quintane et al., 2014; Vu et al., 
2015). In the conditional logistic regression each stratum (or risk set) compares true 
events, set to 1, to null events, set to 0. Independent and control variables are used 
to explain why true events occurred and null events did not. 
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The standard output of a REM is comparable to outputs from logistic regressions, 
where for each covariate a beta-coefficient is estimated, which reflects this covari- 
ate's weight on the hazard of event occurrence. Coefficients are usually reported as 
log-odds and follow standard interpretations of logistic regressions. 


4 Controlling for Endogenous Network Effects 


The heart of REMs are the endogenous statistics that encode patterns in past 
interactions to help explain event occurrence. REMs can incorporate time-varying 
exogenous and endogenous variables or statistics. They are used to explain why 
some events take place at time t and why they have not occurred before. By encoding 
endogenous patterns in these statistics, complex social mechanisms that guide 
social interactions can be uncovered. Moreover, by calculating different patterns, 
their effect on event occurrence can be quantified and compared to each other, 
illuminating which are the driving factors of social interactions. 

The patterns that are encoded in these endogenous network statistics are limited 
only by the researchers' creativity, theoretical ideas on social mechanisms, and 
computational limitations (as further discussed in Sect. 5). 

There are six commonly used statistics that can be expanded into more complex 
patterns of social interactions (see Fig. 3). Inertia measures whether events have a 
tendency to repeat themselves in the event sequence. Reciprocity measures whether 
a previous target node (node a in Fig. 3) directs an event at the previous sender 
node (b). Activity measures how active a sender node is over the course of the 
event sequence and popularity measures how popular a target node is. Closing 
triads measures whether two nodes engage with each other due to their previous 
engagements with a shared partner (node b in Fig.3) and four-cycles measure 
whether indirect engagements (nodes a and d in Fig. 3) drive network closure. In 
directed event sequences closing triads and four-cycles can be used to operationalize 
different closure effects (e.g., cycles or transitive triads). Inertia, activity, popularity, 
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— d) (d 
Inertia 
oS Lo $ M 
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Fig. 3 Classic endogenous network effects can be used to test different interaction patterns in 
temporal event sequences 
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and four-cycles can also be used in two-mode event sequences (where sender nodes 
and target nodes stem from different node sets). 

In the case of country-to-country military aggression, inertia reflects repeated 
attacks from one country to another and reciprocity reflects whether countries have 
a tendency to retaliate. Activity measures whether some countries are more active in 
pursuing alliances or conflicts and popularity measures whether some countries are 
attacked (or called to an alliance) at higher rates than others. Closing triads measure 
whether events involving shared partners spur countries into action (for instance to 
defend their allies) and four-cycles measure whether there is a tendency for certain 
countries to remain neutral versus one another (countries a and d in Fig. 3). 

These six statistics can be made more complex by allowing for attributes of 
the nodes (i.e., checking reciprocity levels among countries with the same national 
language) or by incorporating edge attributes (such as filtering triadic closure for 
positive and negative ties). The two empirical examples introduced in the next 
section both incorporate some of these commonly used statistics in their REM 
and by including edge and nodal attributes find evidence for more complex social 
patterns of interaction that lead to social cohesion and balance. 

Another important component of endogenous network statistics is how they 
incorporate time. Temporal dynamics of social interactions are crucial in under- 
standing how interactions evolve and build up over time. For relational events, 
each true and null event belonging to the same stratum at time f (i.e., belonging 
to one unique point in time on the event sequence) builds a so-called network of 
past events G; to look back over the event sequence prior to time f to determine 
whether previous events can explain which events in the stratum are true events and 
which are null events. 

The network of past events is defined as 


G; = G,(E) = (A; B; w), (3) 


where E = (e1, €2,..., en) represents the set of events, A is the set of sender nodes, 
B the set of target nodes (where A = B for one-mode networks), and w; represents 
a weight function that can be applied to each event before time t. 

The weight function in its simplest form gives a constant weight of 1 to each past 
event in G;. However, the weight function can also be used to give events further in 
the past less weight than more recent events (Lerner et al., 20132). For instance, an 
exponential decay function can be used to account for memory loss or forgetting: 


-a= (42) n2) 

wri, j) = D> [wel -e uj. (4) 
e:de=i,be=j, n» 

te<=t 


where i = a, € A and j = b, € B, we is the weight of event e, t is the current 
time, f, is the time of event e. T;/2 represents the value of the half-life parameter. 
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The half-life parameter specifies how fast the weight of past events diminishes, with 
a smaller half-life giving more weight to more recent events (Lerner et al., 20132). 

The weight function is applied to each past event that completes a social pattern 
or mechanism. For instance, the reciprocity statistic for the dyad (a, b) counts how 
often an event (b, a) occurred in the past: 


reciprocity(G;,a,b) = w;(b, a) (5) 
For each past event (b, a) the time difference (te — t) to the current event dictates 


how much weight should be ascribed to this past event. The triadic closure statistic 
can be operationalized as: 


closingTriad(G,,a,b)— p wi (a, i) - wi (i, b) (6) 
icA 


This approach allows for the testing of complex social mechanisms that involve 
multiple people or actors and follow distinct paths. 

A number of statistical tools are available to estimate REMs. They include 
commands to prepare the data structure of REMs, calculate the endogenous network 
statistics, and estimate REMs. In the statistical computing environment R (R Core 
Team, 2016), the packages relevent (Butts, 2015) and rem (Brandenberger, 
2019) are available to run REMs, where the latter supports exponential time 
weighting of past events. Alternatively, a java-based tool called eventnet (Lerner, 
2019) is available for the analysis of event networks. 


5 Empirical Examples of Alliance Formation 
and Social Influencing 


In this section, two empirical examples are presented where alliance-building 
patterns and social mechanisms behind social cohesion are discussed and analyzed 
using relational event data. 


5.1 Military Alliance-Formation Dynamics 


Lerner et al. (2013a) model military engagements among nations involved in 
the Gulf region. They build on balance theory (Heider, 1946; Newcomb, 1961) 
to examine whether alliance-formation patterns follow the proposed patterns of 
interactions. Balance theory postulates that relationships among individuals or 
actors always have to be balanced in order to endure the passage of time. For 
instance in a triad (i.e., a triangular relationship pattern involving three actors), the 
number of positive relationships among the three actors has to be odd. 
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AAAA 


a friend of my friend an enemy of my enemy an enemy of my friend a friend of my enemy 
becomes my friend becomes my friend becomes my enemy becomes my enemy 


— — —» support 
— opposition 


Fig. 4 Hypotheses of balance theory: Triads are only stable if their number of positive ties is 
odd. The focal actor a becomes active in closing the open triad (dashed relation from node a to 
c). Lerner et al. (2013a) find strong evidence of signed triadic closure in military engagements of 
different nations, indicating that alliances are formed and maintained based on the mechanisms 
postulated by balance theory 


Figure 4 shows all four possible combinations of closed triads with signed edges. 
All three triangles are balanced if the dyad (a, c) is closed with an edge in the 
respected sign. Lerner et al. (2013a) use REMs to test whether nations have a 
tendency to close triads and if so, whether these triads are closed in the way balance 
theory suggests (and depicted in dashed edges between nodes a and c in Fig. 4). 

They use data from the Kansas Event Data System (KEDS) on military actions 
in the Gulf region between 1979 to 1999. The event sequence involves over 200,000 
events among 168 nations (Lerner et al., 2013a, p. 7). Events are coded as one-mode 
network events, with nations x nations interactions encoded in time. Additionally, 
for each event, a weight ranging from [—10, 10] encodes the strength of the 
interaction with positive values indicating friendly interactions and support among 
the two nations, and negative interactions denoting military aggression. They use 
the eventnet application to estimate the effects of (balanced) triadic closure on 
event occurrence. Their results give strong support to balance theory. Moreover, they 
find that controlling for external alliances no longer yields any additional insights 
if reciprocity and triadic closure is adequately controlled for in the model (Lerner 
et al., 2013a, p. 28). This indicates that this form of balanced triadic closure can 
properly represent alliance-formation processes in international relations. 


5.2 Influencing Dynamics Among EU 
Parliamentary Chambers 


Malang et al. (2018) examine social influencing dynamics among national par- 
liaments in Europe. They examine the case of the Early Warning System of the 
EU, where national parliaments in Europe are allowed to veto legislative proposals 
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brought forth by the European Commission. If over one third of the parliaments 
(or more precisely parliamentary chambers) veto the same proposal, the European 
Commission is forced to re-evaluate and defend the proposal. Malang et al. (2018) 
argue that this vetoing threshold poses an incentive for chambers to influence each 
other and try to get them to veto the same proposal. 

They use data from 353 vetoes issued between January 2010 and September 
2016 by 39 parliamentary chambers in the EU. Here, not person x person events 
are examined but chambers x proposals, i.e., a two-mode (or bipartite) event 
sequence, where sender nodes are different from target nodes and events record 
interactions or engagements between the two. The vetoes can be thought of as 
conflictual interactions of chambers with legislative proposals. 

Previous studies on these vetoing actions link a chamber’s decision to veto a 
proposal on their general attitudes towards the EU, how long they have been part of 
the EU, or their capacity to evaluate each proposal. However, these studies neglect 
the relational aspect that such collective vetoing dynamics brings with them. In order 
to reach the threshold, one third of all chambers have to issue a veto, but evaluating 
each proposal and drafting a veto takes time and resources and it is possible that 
chambers simplify their decision to veto by reacting to previously issued vetoes 
and take them as a signal to veto the proposal as well. Alternatively, it is possible 
that chambers try and influence each other by directly approaching members of 
parliament and convincing them to veto a proposal. Both explanations imply a 
strong social mechanism that drives vetoing behavior. One important question that 
arises is whether chambers influence each other based on distinct attributes they 
share. Do chambers with a specific attribute only react to vetoes issued by other 
chambers who share that attribute? And which attributes have this signaling power? 

Malang et al. (2018) examine four different attributes of chambers through which 
this social influencing can run. They test whether two chambers are currently ruled 
by a party from the same party family (i.e., measuring left-right leaning similarities 
of chambers), whether two chambers are governed similarly and embedded in the 
same political system, whether two chambers joined the EU at the same time or 
whether the chambers are from neighboring countries. They use a popularity statistic 
and enhance it with chamber (or country) attribute homophily in order to test which 
similarities guide vetoing dynamics: 


chamberHomophily(G;, a,b) = w(i, b)[ay = ix], (7) 


where a refers to the focal chamber, deciding to veto a proposal b and i refers to 
other chambers that have vetoed proposal b in the past and share the same attribute 
x as the focal chamber (indicator function [ay = i, ]) (Malang et al., 2018, p. 13). 
For each of the four proposed channels of influence, they calculated the 
chamberHomophily-statistic. Parameter estimates for the four different oper- 
ationalizations of homophily (as well as a broad range of control variables) were 
obtained from a conditional logistic regression on an ordinal-timed sequence of 
vetoes. They use the rem-package in R to calculate the homophily statistics 
and estimate the REM. Results of the REM revealed that only two of the four 
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homophily statistics had positive and significant effects on vetoing dynamics. 
Chambers governed by parties from the same party family and embedded in similar 
political systems tend to veto the same proposals (Malang et al., 2018, p. 16). 
Using a permutation approach on the vetoing sequence they further show that only 
party family similarities drive influencing dynamics because they produce event 
sequences with strong temporal correlation among the order of events (Malang et al., 
2018, pp. 14-15). 


6 Discussion 


With increasing prowess of automated data collections, gaining access to data with 
higher temporal resolution of social interactions (both on- and offline) has become 
easier. The added temporal information in the data in turn sharpens empirical 
evidence and opens up new avenues of research on social interaction patterns. Social 
mechanisms that guide social interactions can be analyzed in much more detail and 
opposing hypotheses can be tested against each other. 

These new avenues also pose a challenge: Many theories on social interactions 
do not offer clear insights into how social mechanisms can be operationalized and 
tested. However, the long-standing tradition of modeling dynamic behavior in agent- 
based models (ABM) can offer important insights. REMs can borrow interaction 
rules and patterns from ABMs to help operationalize different patterns in event 
networks. Furthermore, REMs can be used to examine the interplay of different 
patterns (e.g., through interaction effects of different mechanisms). Often social 
interactions evolve and develop over time and REMs can track these changes in 
behavior (for instance through temporal interaction effects) to understand how some 
social mechanisms evolve into others or under which circumstances (i.e., a distinct 
period of time, for instance an election cycle) some mechanisms dominate others. 
REMs can also be used to examine if social mechanisms differ between groups, as 
for instance shown by Brandenberger (2018) that reciprocity guides collaboration 
strategies of Republican members of Congress, but not Democratic members. 

One limitation of REMs are computational constraints. Particularly complex 
(or higher-order) endogenous network statistics are challenging to compute as the 
calculations have to cycle through the event sequence several times to determine 
which past events contribute to a certain interaction pattern. This is especially 
difficult if event sequences are large as the network of past events G; becomes too 
extensive to filter. Sampling strategies can help alleviate this issue and reduce the 
computational burden.? However it remains to be tested, which sampling strategies 
prove efficient for REMs in that they do not prevent the detection of complex social 
patterns in large and sometimes noisy social interaction data. 


?Sampling strategies are often used in rare event logistic regressions (see for example King and 
Zeng, 2001) and could be adapted to REMs. 
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Abstract How do parties discuss migration policy in legislative speeches? Legisla- 
tive bodies are an arena for verbal conflicts where the parties vie for their ideological 
interests but also sharpen new rhetorical figures. Political parties develop policy 
stances strategically different from those of competing parties and elucidate those 
stances through legislative debates and public statements. A large body of literature 
argues that some issues, like migration, fall in a gap between established societal 
cleavages over which parties do not have robust, issue-specific ownership. More 
recent research suggests migration may be a part of a new transnational cleavage 
that pits cosmopolitan sensibilities against nationalist sentiments in a conflict over 
issue ownership and policy framing. Building off this debate, we hypothesize 
that parties discuss migration topics by diverting attention to subcomponents of 
migration policy over which they have established issue ownership. Using machine 
learning techniques, we test this assertion by measuring the differences in salience 
and framing of migration-related topics over time in the debates of the lower 
houses of Canada and the United States—the Canadian House of Commons and 
the United States’ House of Representatives from 1994 to 2016. We find that there 
are substantive differences in the emphasis on and framing of the migration policy 
between the two ideological blocks. Democrats in the USA and liberals in Canada 
emphasize subcomponents of the migration debate which they traditionally own, 
such as welfare and humanitarian aspects. Both conservative blocks do the same 
by framing their discussion of migration through a focus on security and legalistic 
aspect of migration. However, due to strong polarization in the USA, the differences 
in the emphasis on the issues traditionally owned by the two ideological camps are 
stronger in the USA than in Canada. 
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1 Introduction 


Conflict is part and parcel of politics on the international, national, and local levels. 
Morgenthau's influential work defined politics as a struggle for power (Morgenthau 
1960). Originating in legislative bodies of modern democracies, parliamentary 
debates offer an optimal source of information for an in-depth analysis of political 
conflict (Grimmer and Stewart 2013). Migration policy, the substantive focus of this 
chapter, is not only a prominent, much debated political and research issue, but its 
framing in legislative bodies also poses a theoretical puzzle. 

Most political issues, as discussed in the issue ownership literature, fall in either 
the liberal or conservative political domain. This proposal is based on the nature 
of liberal and conservative! political issue ownership where parties develop stances 
on issues that resonate with voters so a common citizen would be able to place a 
particular issue or issue frame with the "correct" political party (Petrocik 1996). 
For instance, voters might reasonably associate social welfare policy domains with 
liberal parties or military and defense policies with conservative parties. In this 
case, on these two issues, the two political camps have issue ownership. However, 
migration policy has been argued to represent an exception as it creates conflicts 
within both ideological camps—market liberalism versus value conservatism (for 
the conservatives) and international solidarity versus welfare state/labor market 
protectionism (for the liberals) (Odmalm 2011a). This leads us to the question: 
how do liberal and conservative parties discuss migration issues in parliamentary 
speeches? 

The literature suggests parties compete against each other by framing policies 
so they resonate with voters and their own platforms (Chong and Druckman 
2007a, b; Nelson and Kinder 1996). We suggest that parties debate migration policy 
by emphasizing the subcomponents of the migration policy which they own. In 
addition, we suggest parties will emphasize their strengths more in polarized party 
systems as this allows their rhetoric to capture the median voter more regularly. We 
expect in a less polarized system, there will be less difference in emphasis parties 
place on issues they own. 

While much of the literature on issue ownership and policy framing—upon 
which our work builds—focuses on campaigning and media statements, this chapter 
analyzes migration policy in legislative speeches. Lefevere et al. (2017) argue 
policy framing is aimed at either winning the rhetorical struggle or at gaining 


'We refer to liberal and conservative ideology in the North American context, which follows the 
lines of the left-right political spectrum. Liberal ideology is identified as left-leaning, egalitarian, 
multicultural, and in support of social policies that appeal to the working class. Conservative 
ideology stands for right-leaning, orientation to tradition, and protection of private property and 
individualism. 
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public support. Thus, we argue floor speeches allow for members and their staff 
to sandbox ideas for talking points or policy frames in a low stakes environment 
prior to developing or revising a party manifesto, campaign platform, or press 
release. Moreover, being the center stage of the formal conflict between the parties, 
legislative speeches are a weapon parties use to demonstrate their strengths in the 
rhetorical arena. 

To test our hypotheses regarding migration policy framing and polarization, we 
use a most similar comparative case study framework (Anckar 2008; Gerring 2009) 
and select the cases of the United States and Canada. Both economies were founded 
largely by immigrants and the discussion on migration policy is a constant question 
for decades in both the USA and Canada. While the two share a clear ideological 
divide between liberal and conservative camps, they differ regarding the degree of 
political polarization—the effect of which we are interested in exploring through 
our comparative case study design. 

Our analysis includes debates in the lower houses of the national legislature in 
Canada and the United States—the Canadian House of Commons and the United 
States' House of Representatives from 1994 to 2016. As the migration debate is one 
that has existed in the political sphere of Canada and the United States since their 
founding as nations of immigrants, we test our hypotheses on a wide, contemporary 
timeframe that captures both liberal and conservative legislatures and executives as 
well as different waves of migration patterns from a variety of conflicts spanning 
much of the world. 

Methodologically, we rely on unsupervised machine learning for our analysis 
of text data. The availability of the legislative speeches in easily accessible digital 
form and the computational power necessary for the analysis of large datasets at 
hand have jointly led to a surge of interest in analysis using automated text analysis 
techniques (see also Maerz and Puschmann in the chapter "Text as Data for Conflict 
Research: A Literature Survey" of this volume). Automated text analysis can be 
used to discover new patterns and structures in the political texts, as well as to 
verify how known covariates affect these patterns. We use structural topic modeling 
to identify specific migration-related topics in the dataset and to understand the 
differences in issue salience and policy framing between the parties. 

The chapter is structured as follows. Section 1 outlines the relevant theoretical 
literature and introduces hypotheses. Section 2 summarizes research design, data, 
and methods. Section 3 presents our results and discusses the implications. Section 4 
summarizes findings and real-world implications of this work and discusses future 
avenues we see as viable and helpful in this research realm. 


2 Theory 


2.1 Party-Based Issue Ownership 


Issue ownership is an established theoretical framework within which most work 
focuses on party manifestos and campaigns. Issue ownership represents "the 
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perceived competence in handling issues and problems" (Stubager and Slothuus 
2013). Generally, the issue ownership framework includes the following: campaigns 
setting the criteria for voter choice, candidates emphasizing issues that present 
themselves as advantaged and their opponent as disadvantaged during a campaign 
to influence voter choice in the election. In the context of our study, issue ownership 
campaigning has been shown to take place in presidential elections in the United 
States and in national elections in Canada (Bélanger 2003; Petrocik 1996). Walgrave 
and De Swert (2007) find that both the party, from the time of inception and 
their driving manifesto, and the media, from the discussion surrounding the party 
and their actions, contribute to the establishment of issue ownership although the 
direction of the causal arrow remains unclear between the two. Issue ownership 
comprises more than just partisanship or attitudes; rather it includes constituency- 
based ownership and perceived development as it relates to the real world (Stubager 
and Slothuus 2013). 

While issue ownership is a well-defined concept that clearly matters for voters 
choosing between parties and candidates in the ballot box, it is established through 
party platforms, manifestos, and, via media coverage, in everyday politicking 
(Lefevere et al. 2015). Scholars suggest that issue ownership can also be identified 
in legislative speeches (Green-Pedersen and Mortensen 2010; Sulkin 2005; Vliegen- 
thart and Walgrave 2011). We use transcripts from the floors of national legislatures 
to see how issue ownership relates to the migration policy in political debates. In 
the next section, we discuss policy framing as a method that political actors use in 
political debates and communication. 


2.2 Policy Framing 


According to Sniderman and Theriault (2004), a framing effect is a "central 
organizing idea or story line" that relates the policy domain in contention to the 
public such that it resonates. Critically, multiple frames can be applied to the 
same issue and can be placed in competition with each other. Further, frames can 
be linked through issue ownership to distinct parties (ibid). Within this realm of 
competition over policy domains and attention from the public, politicians deploy 
policy frames to compete over the dominant train of thought or association over 
relevant policy domains (Nelson and Kinder 1996). The issue ownership literature 
suggests that parties strategically emphasize issues they own to boost electoral 
prospects. However, when external events or crises force them to address issues 
their opponents hold an advantage over, parties frame the issues by choosing to 
focus on a subcomponent of an issue (De Vreese 2005). To provide an example 
within the context of migration policy, liberals can leverage framing by discussing 
humanitarian subcomponent of proposed migration policy. 

Inherent in the context over policy framing and issue dominance, policymakers 
compete over multiple audiences. Parties can seek to frame issues for individuals or 


Migration Policy Framing in Political Discourse: Evidence from Canada and the USA 87 


journalists differently and the dominant framing strategies depend on the design and 
implementation within the particular audience environment (Chong and Druckman 
2007b). 

Research suggests that there are two factors—quality of frames and competition 
over policy frames—that play into policy framing’s impact on public perceptions 
that reduce to a question of quality versus quantity of framing (Chong and 
Druckman 2007a, b). Using experimental design, Chong and Druckman (2007a) 
find that the strength or quality of the frame matters most to citizens. Moreover, the 
general debate rallies where competition over policy frames arises. Further, for a 
policy frame that does not resonate with the population, the resulting preferences 
rely on the underlying value distribution of the population. However, with a strong 
frame, public opinion was successfully changed for both competitive and non- 
competitive issues (ibid). Thus, inherent in the contest over policy framing is a 
contest over the lens through which voters view the policy at hand and a successful 
frame can be hugely beneficial to a party’s ability to win the policy debate and 
achieve policy goals in line with their platform. 

Additionally, policy framing can be followed by policy reframing, where parties 
deliberately shift the policy frame from one issue to another, separate issue. 
Reframing typically occurs when the debate is topically relevant and matters to 
the party. Parties push the newly reframed issue to bring the debate back in line 
with the domain the party owns (Lefevere et al. 2017). The literature argues 
that parties not only frame and reframe based on issue ownership, but also that 
issue ownership can be bolstered by proper policy framing, especially in areas of 
contested ownership like the immigration debate sphere (Hánggli and Kriesi 2010). 
Empirically, the literature has only started to disentangle the occurrence and effect 
of framing and reframing in political debates (ibid). Continuing with our previous 
example of migration policy, liberals could decide not to focus on migration-related 
humanitarian subcomponent, but rather to impose the debate on humanitarianism 
proper as a completely distinct policy domain. Next, we consider the role of issue 
ownership within the migration policy domain specifically as it presents complex 
dynamics within the debates. 


2.3 Inter-Party Contest over Migration Policy 


Most issue ownership debates scaffold up from Downsian ideas of democracy, in 
which, parties work to win votes by adopting uniquely dominant policy stances 
that cross a number of issues (Downs 1957). The tactic of building issue-based 
ownership is further complicated when one issue, like migration, encompasses 
multiple, opposed issues. Odmalm (2011b) argues that the questions included in 
the immigration policy split both traditionally liberal and traditionally conservative 
parties down the middle, since immigration touches on both moral liberalism and 
value conservatism while also including international solidarity and welfare system 
protectionism, thus causing internal strife within both major ideological camps. 
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Specifically, he argues migration and other policy areas that have “old” or 
economic value-laden issues and “new” or socio-cultural issues present these types 
of internal conflicts for parties (ibid). Odmalm (2012) argues that in cases like 
these, parties divert to subtopics within the broader topic on which they have a 
strategic advantage and can hold dominant stances but ultimately are not well 
equipped to handle a complex idea like migration with its cross-issue composition. 
This argument is exemplified in research pointing to party restructuring within 
Europe. On topics of European and national identity, Lahav (1997) finds that 
traditional right-left party construction “has been reinvented which is mirrored in 
the debates on immigration within the EU.” Further, Helbling (2014) finds policy 
framing on migration topic occurs across Western Europe based on the political 
events surrounding the debate and the actors engaged in the debate, thus linking the 
political debate around migration topics to both issue ownership and policy framing 
literatures. 

This logic of migration and other policy domains that fall between party- 
based issue ownership potentially restructuring party-issue orientation flows from 
cleavage theory. Within cleavage theory, first, party systems are “determined in 
episodic breaks from the past,” second, parties are generally inflexible in their 
approaches to their core issues and beliefs, and third, new parties result in changes 
to the current systems (Hooghe and Marks 2018). A signal of this could be parties 
shifting away from issues that reinforce their cores and instead stretch them across 
multiple, competing issues thus weakening them, and possibly allowing space for 
new parties to form. An indication of this could be the inability to formulate a 
cohesive party message on the floors of the legislature on key policy issues like 
migration (Hooghe and Marks 2018; Odmalm 2011b, 2012). Yet the study of 
issue ownership and policy framing on migration topics is dominated by studies of 
European contexts (Helbling 2014; Lahav 1997; Odmalm 201 1a). This study works 
to bring this important question to the shores of North America where a different, 
but no less salient conversation is ongoing. 


2.4 Hypotheses 


Building on the theoretical frameworks of issue ownership and policy framing and 
on the literature that uses computational tools to examine the content of political 
texts, we focus on identifying issue ownership, changes in issue salience, and pat- 
terns of policy framing in USA and Canadian lower legislative chamber speeches. 
According to Odmalm’s theory, parties will avoid topics which raise inconsistencies 
along “old” (economic issues, e.g., taxation and welfare) and “new” (multicultural, 
environmental) cleavages (Odmalm 2011b). Even when parties change their rhetoric 
due to popular dissatisfaction with immigration policies, their framing or tone of 
the issue will reflect the (in)stability of the societal fault lines, and the relative fit 
between these cleavages and parties’ choice of issue framing (economic or socio- 
cultural) (Odmalm and Super 2014). 
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We suggest parties will discuss migration in a way that allows them to focus on 
subcomponents of the migration topic over which they have issue ownership. More 
precisely, we expect that migration-related issues over which liberal issue ownership 
is well established, such as poverty, education, weapon control, environment, and 
health will be more often debated by Democrats in the USA and liberals in Canada. 
By contrast, we expect Republicans in the USA and conservatives in Canada to 
discuss more often migration-related topics such as external threats and security. 
Due to differences in polarization between the two countries, we hypothesize that 
differences in emphasis on the migration-related topics will be stronger in the USA 
than in Canada. More specifically, we expect that differences in the emphasis parties 
place on migration-related subtopics and in policy framing will be smaller between 
liberals and conservatives in Canada than between Republicans and Democrats in 
the USA. 


3 Data and Methods 


3.1 Comparative Case Study Approach 


In order to test our hypotheses, we adopt Mill's most similar systems design 
(Gerring 2009). The two selected cases are the United States and Canada. The 
two countries hold the lion's share of power on the North American continent and 
are geographically isolated from other states. Apart from their common border, the 
USA has a significant southern land border with Mexico, while Canada has a large 
but mostly uninhabited northern border. Both face challenges with legal, illegal, 
and humanitarian-based migration, although to differing degrees in those categories. 
Both have dominant conservative and liberal ideological camps that historically and 
competitively vie for power, thus allowing our study to examine the differences in 
language used in legislative debates. 

However, they differ with regards to party polarization. As a two-party system, 
the United States serves as an example of high polarization. Canada, as a multi- 
party system, despite recent changes since 2007 (Brady 2014), is recognized in 
the literature as the epitome of a non-polarized system (Johnston 2015). These 
differences allow for the exploration of migration topics across two similar political 
systems that differ with regards to the levels of polarization. Importantly, both are 
nations founded (largely) by immigrants and are often seen as bastions of hope 
for those trying to start a new life. Therefore, the two case studies are very different 
from the European nation-states on which the recent migration literature has focused 
(Lahav 1997; Odmalm 2011a, 2012). From a practical perspective, two points 
motivated this case selection because they facilitate comparison. First, English- 
speaking Canada and the USA face many of the same policy issues. Second, both 
legislatures publish detailed transcripts of debates in English. 
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For the United States analysis, we use Gentzkow et al. (2018) Congressional 
Record for the 43rd—114th Congresses: Parsed Speeches and Phrase Counts data, 
which captures all floor debates from the United States House of Representatives. 
We subset the data to include only the House of Representatives’ speeches from 
Winter 1994 to Fall 2016. For the Canadian analysis, we collected debates from 
the House of Commons from Beelen et al. (2017). We use the OpenParliament.ca 
PostgreSQL dump provided by Beelen et al. (ibid), which covers speeches from 
1994 and trim the US data accordingly. Similarly, we remove speeches in the 
Canadian data that come after 2016, as this would have no counterpart in the US 
data. We focus only on the lower houses of the national legislature to maintain 
comparability across cases. This is necessary as the United States' Senate is an 
elected body and the Canadian Senate is not. From there, we processed the data into 
text corpora for analysis.? 

To better characterize differences in rhetoric on the topics of migration and 
build a comparison between the United States and Canadian political sectors, 
we divided parties into "liberal" and "conservative" groups. For the USA, this 
was straightforward with Democrats as the liberal camp and Republicans the 
conservative.^^ However, in Canada, the divide is more complicated. In this 
paper, when we discuss "conservative" parties in Canada, we are referring to the 
Conservative (C), Canadian Alliance, Progressive Conservative (PC) and Reform 
parties. Liberal (L), New Democratic Party (NDP) and any successors of the Bloc 
Quebecois parties all become "liberal.? In the USA, this time span includes the 
Presidencies of Bill Clinton (D), George W. Bush (R), and Barack Obama (D) and 
the 103rd—114th Congresses within which the majority party changed several times. 
In Canada, this spans the Campbell (PC), Chretien (L), Martin (L), and Harper (C) 
Governments, who also represent liberal and conservative blocks in power. Further, 
this timespan includes migration during the fall of Yugoslavia and ensuing violent 
conflicts, the genocide in Rwanda, both Desert Storm and Desert Shield/Operation 
Iraqi Freedom, the September 11, 2001 attacks, the war in Afghanistan, the Arab 
Spring protests, and the beginning of the Syrian Civil war, the rise of violence in 
Central America as well as a number of other conflicts. 


?In addition to the high-level overview in the following sections, detailed steps to replicate this data 
collection process for the United States and Canada is provided in supplemental documentation. 
3Historically there have been realignments between the parties, which are of interest to the topic 
of migration generally. However, realignments are not an issue in the time period analyzed in this 
paper. 

^For the US data, we kept the independent members of Congress in the dataset but dropped strictly 
logistical and parliamentary speeches from the data. In the analysis, we are not focusing on the 
Independent member’s speeches as the number of speeches is not representative. For the Canadian 
data, we dropped all strictly logistical parliamentary speeches from the data as well. 

5It should be noted that other, smaller parties are also collapsed into the liberal category. All parties 
that are not Independents (dropped), or included in the conservative grouping, are collapsed into 
the liberal group. 
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3.2 Dataset Subsetting: Dictionary Approach 


To identify migration-related speeches and reduce the size of the corpus in 
initial analyses, we use a dictionary approach to subset the data. The original 
US dataset has 377,817 Democratic speeches, 374,397 Republican speeches, and 
1840 Independent Speeches. The original Canadian dataset has 231,681 Liberal 
Party speeches and 172,089 Conservative Party speeches, with a combined 71,068 
speeches by other conservative parties. For the purposes of investigating the 
outlined research question and particularly for application in the USA and Canadian 
context, we take a simple approach to subset the corpora of speeches to include only 
those that are relevant to migration topics. We use a dictionary approach but limit it 
to three words: asylum, refugee, and immigration.’ These words are stemmed along 
with our corpora and we analyze the speeches that include at least one of these 
three words relevant to the migration debate in the United States and Canada.? The 
resulting datasets include 15,547 (5% of all) and 15,072 (2% of all) speeches for 
Canada and the United States, respectively. 


3.3 Structural Topic Modeling 


Existing computational social science literature suggests that political disagreement 
and issue ownership can be understood by quantitatively analyzing relative empha- 
sis on different terms, ideas, and arguments in political texts (Baum 2012; Lowe 
2008; Sakamoto and Takikawa 2018). Based on US legislative data, Gerrish and Blei 
(2012) outlined legislators’ policy positions on specific issues and, using supervised 
machine learning, explored how the language of laws is correlated with political 
support. Combining political psychology and machine learning, topic modeling has 
been applied to the House of Representatives floor speeches to measure member 
personality traits (Ramey et al. 2016). Hierarchical topic modeling was applied to 
look at how the Tea Party Republicans relate with regards to their ideal points to the 


6Other parties have: 116,867 (NDP), 64,845 (Bloc Quebecois and related parties), 38,132 (Reform 
Party), 21,003 (Canadian Alliance), 11,928 (Progressive Conservative), 3573 (Green), 1865 
(Independent). 


7 As a robustness check, we conducted the analysis with a much broader dictionary, which delivered 
similar results. However, a minimalistic version of the dictionary ensures that the speeches we 
include and topics resulting from the analysis truly belong to a migration-related policy area. 


8 As an interesting feature of stemming data like ours that focuses on the word refugee, we retain 
the stem refuge which results in a number of wildlife preservation speeches, especially in the case 
of the United States where the Arctic National Wildlife Refuge (ANWR) and oil exploration there 
was of great contest during our time frame of analysis. However, as wildlife refuge issues are 
not of relevance for the analysis of human migration, we classify wildlife refuge related issues as 
“irrelevant.” In the following section, we explain more on how we coded this and other topics to 
ensure focus on migration policy-related issues. 
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remaining of the Republican representatives in Congress (Nguyen et al. 2015). A 
similar hierarchical modeling approach was used to analyze the political priorities 
emphasized in US Senate press statements (Grimmer 2010). However, the existing 
studies have not expanded the computational science approaches to study how 
issue ownership and policy framing affect legislative debates. To test the above- 
outlined hypotheses, we use unsupervised machine learning techniques, specifically 
structural topic modeling. 

Topic modeling is a non-supervised machine learning technique that allows for 
the detection of topics in a text corpus that has not been previously manually 
analyzed and coded. The idea behind topic modeling, in general, is that each word 
from the text has a certain probability of belonging to a topic and each document 
represents a mixture of topics (for our purposes a document is a legislative speech). 
Structural topic modeling allows for the assessment of the role of selected covariates 
on the latent topics detected through topic modeling. To analyze how political camps 
frame the migration policy in the legislative speeches, we employ structural topic 
modeling implemented in R (2018) in the stm package (Roberts et al. 2017). As we 
are interested in understanding how topic prevalence (frequency of topic discussed 
across speeches) and topic content (how a topic is discussed) are affected by the 
ideological camp to which the politician belongs, stm is the optimal choice of 
software for this analysis. 

The stm package builds on Latent Dirichlet Allocation (Blei et al. 2003) and 
it also offers the spectral initialization—a non-negative matrix factorization of the 
word co-occurrence matrix. Following the literature on spectral decomposition 
with particularly large datasets (more than 10,000 units in the vocabulary), we opt 
for this approach (Roberts et al. 2017). Using the migration-related subset of the 
parliamentary speeches, we process the data by removing custom and built-in stop 
words as well as the words that only appear in one document. We also took the 
standard steps of case lowering, stemming, and removing punctuation to clean our 
corpus. 

Moreover, using the advantages of the structural topic modeling approach, we 
specify the interaction effect of the ideological position of the speaker and the date 
of the speech as the prevalence covariate and ideological position of the speaker as 
a content covariate. The prevalence covariate allows us to analyze how the selected 
attributes from the linked metadata effects the contribution of each topic in the 
documents. The content variable provides the ability to consider how the metadata 
attributes effect which words are prominent in each topic. 

To shed light on the empirical plausibility of the outlined hypotheses, we proceed 
to develop two models: one for the speeches in the US House of Representative and 
the other for the speeches in the Canadian Parliament. Both models were defined 
to include 20 topics. In selecting the number of topics, we tested several other 
options—providing either a more nuanced or crude understanding of migration 
policy and relevant subcomponents outlined in the speeches. While stm package 
offers a technical solution to determining the most informative number of topics, we 
instead chose the number based on theoretical underpinning. We opt for 20 topics 
for two reasons. First, it allows a fairly nuanced overview of topic subcomponents 
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as the literature suggests are being debated along with the core migration issues 
(e.g., culture, economy, and security). Second, it also enables us to distinguish 
a humanitarian subcomponent. The previous work on migration issue ownership 
largely focuses on the European continent and on economic, cultural, and security 
aspects of it (Odmalm 2014). The humanitarian aspect, while certainly present in 
the media and public space, is packaged as the legalistic discussion on the asylum 
rights and has to an extent been left out in the existing studies focusing on migration 
policy in the European context. Our model allows us to explore the framing of this 
part as well. 


3.4 Labeling and Categorizing Topics 


In order to conceptualize the differences between the United States' and Canadian 
speeches descriptively over the approximately 20 years in our study, we categorize 
the 20 topics from each of the two country models into six broader-areas or 
categories that crisscross lenses of issue ownership: the Economy, Culture, Security, 
Human Rights, Migration Core, and Irrelevant category. This step is necessary as 
unsupervised machine learning does not guarantee interpretable results that are 
comparable across corpora. More concretely, one raw topic from the Canadian 
corpus is not directly comparable to another raw topic in the USA without human 
interpretation and validation to ensure the topics are reasonably alike. The grouping 
stage by human coders accomplishes this goal. We categorize the model outputs 
for both countries based on the most frequent policy areas that the models found. 
While descriptive in nature, these summary statistics allow us to draw comparisons 
between our two cases and better understand the nature of the topics within our data. 

To ensure objectivity and coherence between the two countries, while taking into 
account case study expertise of the team working on this paper, each topic was coded 
by two of the three authors. If the coding differed, the third author (and case study 
expert) adjudicated and assigned the final label.? 


4 Results 


We divide the results section into four parts. We first outline the 20 topics our 
model has produced to give readers a better understanding of the data we analyzed. 
Second, we analyze which migration-related topics can be associated with either 
of the two ideological camps. Here, we disregard time and focus on the entire 
dataset. Third, we focus on the changes in topic salience across time for a number of 
theoretically relevant topics. Fourth, we focus on understanding how parties frame 


?Marcella Morris for the USA and Tyler Amos for Canada. 
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migration policy through the selection of representative keywords from individual 
speeches. Each of these steps allows for the testing of our hypotheses from a 
different perspective. For each of the steps, we start with the United States, which 
we then compare with Canada. 


4.1 Topics in the USA and Canada 


Figure 1 shows the number or count of topics that we coded under broader categories 
of the Economy, Culture, Security, Human Rights, Migration Core, and Irrelevant 
topics. Within the Migration Core category, across both cases, we see topics related 
to acts of immigration by humans crossing borders and the law enforcement efforts 
involved in border protection but also the legal parameters lawmakers wrestle 


Topic Classification 


Security 


Migration Core 


2 
S Human Rights 
D Country 
5 Canada 
9 Bl United States 
& Economy 
ke 
Culture 
Irrelevant 


0 2 4 6 
Count 


Fig. 1 Topic categorization: migration subsample for the US House of Representatives and the 
Canadian House of Commons (1994-2016) 
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with, such as regulation of visas and requirements for citizenship. Within the 
economic topics, we categorize budget, welfare programs, ? international trade, job 
programs, and infrastructure type topics which arise as a feature of the discussions 
about the impact of immigration. Under Culture, we find topics related to welfare, 
religion, aspirational descriptions of citizenship, multiculturalism, and the legacy of 
immigration for the national identity. The Security and Human Rights categories 
are clearer than some of the others. For Security, we include the topics that focus on 
terrorism, international conflicts, or wars, while Human Rights includes topics that 
include humanitarian aid, violence against women, and refugee rights. 

It should be noted that we categorized a number of topics as “Irrelevant.” This 
category mostly includes topics comprised of parliamentary speech and procedural 
language required by the chambers from which the speeches originate. !! 

Table 1 presents the 20 topic outputs for the United States and Canada along with 
the coding into broader categories. Asterisks are used to denote truncated words. 
We stemmed our corpora in order to better match words, here illeg* indicates the 
dictionary would include all words that include the root of illeg, regardless of its 
ending thus including both illegal, illegally, and illegals in our data with only one 
dictionary entry. We list the topics in the same order as the model produced them, 
making it easier for the readers interested in replicating our steps to also follow 
our coding of the topics. In the following pages, we unpack these differences more 
through the lens of ideological camps, but the differences in the raw data provide 
helpful context to the broad questions we tackle, as there are clear differences 
between the two countries, their politics, and debates. 

In addition to the coding scheme used to group topics produced by the model in 
six broader categories, Fig. 2a, b list topics according to their proportion across 
speeches. In the next section, we present differences in topic salience between 
ideological camps in the USA and Canada. 


4.2 Topic Association by Ideological Block 


We first focus on topic prevalence in the US House of Representatives. It answers 
the question which migration-related topics are associated with the Democrats or 
Republicans. Out of the 20 topics from the model, we coded a number of them as 
irrelevant for the analysis—having mostly been dealing with the procedural aspects 
of the parliamentary debates.!* Keeping the time component of the structural topic 
model fixed, we first analyze which migration-related topics the two US parties 


10The economic welfare programs comprise spending and job training programs from a budgetary 
perspective while cultural welfare topics include social safety-net-type programs. 


'lWe include the wildlife refuge topics which were present in the United States’ and Canadian 
topics in the “Irrelevant” category. 


12 4 list of topics and words available in Table 1a, b. 
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Table 1 STM topic output and classification 


S. Hajdinjak et al. 


No. 


Label 


(a) United States 


Topic output (marginal highest 
probability output) 


Category 


1 


Immigration types 


immigr*, illeg*, countri*, legal, american, 
will, state, come, unit, law, citizen, famili*, 
reform, visa, system, million, nation, 
citizenship, children, status 


Migration Core 


Law enforcement 


law, enforc*, feder*, state, alien, crimin*, 
local, crime, immigr*, offic*, illeg*, depart, 
justic*, polic*, communiti*, will, citi*, case, 
deport, attorney 


Migration Core 


Procedural 1 


gentleman, yield, thank, texa*, california, 
madam, consum*, act, rank, subcommitte, 
gentlewoman, judiciari*, minut*, bipartisan, 
may, balanc*, reform, chair, distinguish, 
section 


Irrelevant 


Wildlife refuge 


refug*, nation, land, wildlif*, park, area, 
protect, water, will, servic*, forest, state, fish, 
feder*, conserv*, acr*, manag*, resourc*, 
public, legisl* 


Irrelevant 


Voting 


presid*, vote, constitut*, congress, law, court, 
elect, execut*, state, power, action, act, will, 
voter, suprem*, senat*, right, obama, district, 
branch 


Culture 


Multicultural America 


american, nation, state, communiti*, world, 
unit, histori*, honor, asian, america, great, 
countri*, first, freedom, serv*, day, contribut*, 
pacif*, live, island 


Irrelevant 


Human trafficking 


women, violenc*, victim, protect, act, state, 
right, will, abus*, administr*, domest*, 
american, report, traffick*, terrorist, crime, 
countri*, human, one, kill 


Human rights 


10 


Foreign aid 


Procedural 2 


Security 


million, fund, assist, provid*, program, 
include*, aid, state, intern, will, increas*, 
billion, effort, request, help, develop, countri*, 
foreign, guam, appropri* 

one, say, will, countri*, great, thing, state, 
serv*, life, talk, nation, come, live, district, 
famili*, now, citi*, school, first, honor 

secur*, border, homeland, depart, nation, 
terrorist, will, protect, need, fund, agent, state, 
agenc*, patrol, guard, intellig*, terror, 
commiss*, enforc*, attack 


Human rights 


Irrelevant 


Security 


(continued) 
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Table 1 (continued) 


No. 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


Label 


Procedural 3 


Budget 


Oil production 


Procedural 4 


Education 


International conflicts 


Threats 


Welfare 


Humanitarianism 


Procedural 5 


Topic output (marginal highest 

probability output) 

say, dont, now, come, thing, one, talk, will, 
job, back, countri*, got, american, that, look, 
america, let, happen, money, tri* 

republican, tax, budget, will, american, cut, 
care, health, billion, pass, spend, democrat, 
congress, reform, pay, major, vote, money, job, 


cost 


energi*, oil, gas, drill, will, price, percent, 
nation, state, need, product, use, develop, 
water, environment, increas*, million, 
compani*, american, arctic* 

amend, chairman, rule, provis, will, act, 
requir*, appropri*, fund, languag*, state, 
legisl*, provid*, offer, debat*, author, process, 
report, confer*, member 

educ*, school, will, america, children, state, 
one, american, need, unit, student, law, come, 
nation, number, million, now, countri*, 


problem, place 


state, unit, peac*, war, will, world, nation, 
israel, presid*, forc*, militari*, must, refuge*, 
countri*, resolut*, genocid*, now, intern, one, 


continu* 


border, iraq, state, mexico, drug, unit, war, 
militari*, will, come, troop, patrol, 
afghanistan, one, mexican, illeg*, forc,* 
nation, iraqi, soldier 

program, children, state, health, educ*, care, 
will, provid*, need, famili*, percent, worker, 
child, school, benefit, food, servic*, employ, 


help, english 


refuge*, human, right, state, unit, china, haiti, 
vietnam, religi*, freedom, will, polici*, 
countri*, cuba, persecut*, cuban, polit*, 
democraci*, haitian, asylum 

will, legisl*, american, opportun*, believ*, 
need, america, hope, colleagu*, trade, abl*, 
deal, good, issu*, countri*, state, forward, 
agreement, problem, congress 


Category 


Irrelevant 


Economy 


Economy 


Irrelevant 


Culture 


Security 


Security 


Economy 


Human rights 


Irrelevant 


(continued) 
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Table 1 (continued) 
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No. 


Label 


(b) Canada 


Topic output (marginal highest 
probability output) 


Category 


1 


Budget 


conserv*, liber*, will, prime, budget, elect*, 
parti*, money, cut, public, now, chang*, job, 
one, say, vote, tax, promis*, polit*, noth* 


Economy 


Humanitarianism 


refuge*, humanitarian, syrian, help, assist, 
million, will, countri*, syria, need, provid*, 
aid, intern, crisi*, communiti*, situat*, effort, 
region, resettl*, organ* 


Migration Core 


First nations 


nation, will, aborigin*, land, first, agreement, 
treati*, protect, water, environ*, environment, 
throne, act, speech, area, park, territori*, 
develop, chang*, climat* 


Irrelevant 


Procedural 1 


will, member, parti*, opposite*, parliament, 
debat, vote, legisl*, reform, report, liber*, 
stand, consult, ask, made, amend, act, day, 
elect, say 


Irrelevant 


Immigration types 


immigr*, citizenship, applic*, famili*, 
canadian, resid*, process, foreign, worker, 
countri*, system, program, will, come, 
temporari*, perman*, visa, citizen, number, 
offic* 


Migration Core 


Citizenship 


act, court, citizenship, law, right, legisl,* 
canadian, amend*, will, citizen, case, charter, 
person, process, chang*, one, decis*, rule, 
claus*, may 


Culture 


Welfare 


women, health, care, poverti,* need, senior, 
program, will, social, live, hous*, medic*, 
equal, system, incom*, feder*, provinc*, 
servic*, access*, countri* 


Economy 


Law enforcement 


crimin*, crime, victim, will, justic*, law, 
sentenc*, offenc*, serious, marriag, women, 
person, commit, forc*, deport, act, polic*, 
protect, legisl*, violenc* 


Security 


10 


Provincial concerns 


Asylum process 


quebec, feder*, provinc*, languag*, bloc, will, 
french, nation, offici*, one, cultur*, popul*, 
constitut*, québécoi*, must, provinci*, 
jurisdict*, francophon*, canadian, recogn* 
refuge*, countri*, system, immigr*, claim, 
appeal, will, claimant*, protect, process, 
status, decis*, asylum, board, case, determin*, 
arriv*, need, divis*, safe 


Culture 


Migration Core 


(continued) 
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Table 1 (continued) 


No. 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


Label 


History of immigration 


Family support 


Investment in Jobs 


Privacy or security 


Human trafficking 


Economic sectors 


Security 


International conflicts 


Procedural 2 


International trade 


Topic output (marginal highest 

probability output) 

canadian, countri*, communiti*, world, 
nation, histori*, one, great, cultur*, immigr*, 
day, proud, first, divers*, societi*, contribut*, 
valu*, recogn*, will, live 

famili, children, tax, child, parent, live, 
communiti*, young, ride, will, home, school, 
help, one, care, incom*, pay, need, centr*, 
member 

will, program, budget, tax, need, job, invest, 
econom*, economi*, fund, plan, help, busi*, 
billion, provid*, increas*, million, feder*, 
employ, benefit 

inform, will, act, canadian, privaci*, public, 
servic*, person, census, data, state, use, 
provid*, air, passeng*, access, may, legisl*, 
concern, one 

human, right, intern, traffick*, protect, 
countri*, smuggl*, will, immigr*, exploit*, 
organ*, freedom, include*, trade, abus*, 
convent*, victim, smuggler, crimin*, law 
industri*, will, trade, agricultur*, product, 
region, communiti*, farmer, ride, compani*, 
provinc*, job, agreement, atlant*, busi*, area, 
coast, sector, island, nova 

secur*, border, state, will, terrorist, unit, 
terror, agenc*, nation, offic*, safeti*, 
american, organ*, custom, canadian, 
septemb*, need, protect, rcmp, polic* 

will, forc*, war, militari*, peac*, must, intern, 
mission, world, nation, canadian, conflict, 
state, secur*, unit, iraq, afghanistan, action, 
countri*, nato* 

countri*, one, say, come, talk, thing, will, 
look, problem, need, deal, happen, back, tri*, 
someth*, good, realli*, colleagu*, ask, kind 
trade, countri*, intern*, polici*, canadian, 
develop, will, world, foreign, econom*, 
depart, per, agreement, cent*, state, unit, must, 
interest, canada, global 


Category 


Culture 


Economy 


Economy 


Security 


Human rights 


Economy 


Security 


Security 


Irrelevant 


Economy 


100 S. Hajdinjak et al. 


(a) Top Migration Topics (b) Top Migration Topics 
in the US Congress in the Canadian House of Commons 
Topic 1: Immigration Types —————————————— Topic 19: Procedural 2 
Topic 14: Procedural 4 — Topic 5: Immigration Types 
Topic 12: Budget 12 ————————————— Topic 10: Asylum Process 
Topic 16: International conflicts —————————— Topic 1: Budget 
Topic 11: Procedural 3 —————————— Topic 11: History of Immigration 
Topic 6: Multicultural America —— ———— Topic 13: Investment in Jobs 
Topic 18: Welfare — Topic 4: Procedural 1 
Topic 9: Procedural 2 ———— Topic 2: Humanitarianism 
Topic 10: Security ———— Topic 18: International Conflicts 


Topic 2: Law Enforcement Topic 6: Citizenship 


Topic 4: Wildlife Refuge 


Topic 17: Security 


Topic 20: Procedural 5 Topic 8: Law Enforcement 


Topic 9: Provincial Concerns 


Topic 8: Foreign Aid 


Topic 5: Voting — Topic 12: Family Support 
Topic 3: Procedural 1 —— Topic 7: Welfare 
— Topic 17: Threats — Topic 15: Human Trafficking 
— Topic 13: Oil Production — Topic 14: Privacy or Security 
— Topic 19: Humanitarianism — Topic 16: Economic Sectors 
— Topic 7: Human trafficking — Topic 20: International Trade 
— Topic 15: Education — Topic 3: First Nations 
ij T T T T T 1 r T T T T T 1 
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 
Expected Topic Proportions Expected Topic Proportions 


Fig. 2 Expected migration-related topics proportion legislative speeches in the US Congress and 
Canadian House of Commons. (a) Top migration topics in the US Congress. (b) Top migration 
topics in the Canadian House of Commons 


emphasize. Figure 3a provides a visualization of the expected topic prevalence 
contrasted for Republicans and Democrats—excluding topics coded as irrelevant. 
The sequence of the topics follows the categorization introduced in the previous 
subsection and allows comparison across two cases—from Migration Core, Secu- 
rity, Economy, Culture to Human Rights. Republicans dominate the discussions on 
the types of immigration, law enforcement, security, and threats when compared 
to Democrats. Budgetary aspects, use of resources, and international conflicts fall 
more on the side of the Democrats. Both parties emphasize equally multiculturalism 
in the USA, welfare, and education aspects of migration. 

In Canada, we see a similar picture with differences in the topic association being 
smaller between the two blocks (see Fig. 3b). Topics such as immigration type 
and legal aspects of migration policy, which we classified in the Migration Core 
area, fall between the two ideological blocks. Moreover, humanitarianism, human 
trafficking, security, and international conflicts are equally associated with liberals 
and conservatives. A number of topics, traditionally owned by liberals, such as 
budget, welfare, and family support topics are indeed more strongly associated with 
liberals. Historical migration with an emphasis on Canadian multicultural nation 
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(a) Migration Topics in the US (b) Migration Topics in the Canadian 
Congress Parliament 
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Fig. 3 Expected Migration-related Topics Proportion between Parties in the US House of 
Representative and the Canadian House of Commons. The coefficients refer to the expected topic 
proportion of the text corpus determined as a function of the selected variables. In this case, the 
topic prevalence for each of the migration-related topics is contrasted for two parties, Republicans 
and Democrats. (a) Top migration topics in the US Congress. (b) Top migration topics in the 
Canadian Parliament 


and debates on jobs fall to the side of the conservatives. We see a contrast in 
the budget and jobs topics association—while the budget aspect of the migration 
is more salient for liberals, the jobs aspect is more associated with conservatives. 
Topic association in both the USA and in Canada is aligned with our hypotheses. 
Ideological camps tend to emphasize more subcomponents of migration policy 
over which they traditionally have ownership over. We also show that, as expected, 
the differences in the topic association are, due to weaker polarization, smaller in 
Canada than in the USA. 
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4.3 Topic Prevalence Across Time 


Figure 4 illustrates how House members in the United States discussed migration 
topics from 1994 to 2016. To understand how the emphasis of the migration-related 
topic changes across time, we select four topics—two topics from the core migration 
area, one topic representing issues traditionally owned by Democrats, and one by 
Republicans. The four selected topics are the following: an immigration type topic, 
an immigration enforcement topic, a security-related topic, and a human trafficking 
topic.!? For each of these topics, we analyze how topic prevalence has changed 
across time and party. 

First, we focus on the types of immigration topic. The topic deals with different 
types of immigration from the perspective of legality, legal status, and procedures 
immigrants go through. Change in topic prevalence over time is visualized in Fig. 4 
Three peaks are visible in the period between 1994 and 2016. The first peak occurred 
in the late 1990s, in parallel to the aftermath of the Yugoslav and Rwandan conflicts 
and during the NATO bombardment of Serbia and Montenegro during the 1999 
Kosovo crisis. By the beginning of the 2000s, the topic of immigration types become 
somewhat less salient, only to return to the House in the 2000s, likely as a result of 
the rise of the threat of terrorism related to the 9/11 terrorist attack. Both in the 
late 1990s and in the early 2000s, Republicans emphasized the topic more than 
Democrats. The last peak in topic salience occurs in 2014, which coincides with 
the Syrian conflict and moreover with the offensives after the break of Kofi Anan's 
ceasefire attempt in 2013. While Republicans emphasized discussion of the types of 
immigration more during the first two peaks, Democrats saw it as more salient in 
the last period. 

The second topic we focus on is migration law enforcement. This topic deals 
with the violation of the legal framework related to migration and its consequences. 
After 2001, it becomes a more salient topic in the House of Representative and 
Republicans emphasize it more. From the law enforcement perspective, 2005 and 
2014 represent important points in the timeline. 

While both of these topics represent core migration area issues, our US model 
outlined another subcomponent of the migration topic, which has, to our knowl- 
edge, not been discussed extensively in existing studies—human rights or more 
specifically human trafficking. This topic refers to issues related to crime and abuse, 
particularly emphasizing women and trafficking crimes. Following our expectations, 
Democrats appear to own this issue from the mid-1990s through 2015, as they place 
greater emphasis on this topic.!* 


I5 For an overview of the words constituting each topic, consult Table 1a, b. 


'4The change in prevalence on the human trafficking topic from Democrats to Republicans around 
2015 may be a feature of the campaigns for the 2016 Presidential election, the increased migration 
(or focus on migration) as a result of the Central American crises or another set of issues. Future 
work could explore such changes over time in more detail. 
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Fig. 4 Migration-related Topics Association between Parties in the US House of Representatives 
(1994—2016). We report the expected topic proportion not simply topic proportion, because it is 
an approximation based on the default STM settings—incorporation of uncertainty in proportion 
estimates via the method of composition (Roberts et al. 2017) 


We now turn our attention to the security graph in Fig. 4 as a subcomponent along 
which migration policy is often discussed. This topic refers to border protection as 
well as security issues like terrorism. Security is an issue traditionally associated 
with Republicans, which can also be seen in Fig. 4's Security panel that illustrates 
the security topic from our US model. Republicans emphasize the security topic 
more until the second part of the 2000s when it becomes more salient for Democrats. 

Using the same theoretical approach for Canada as for the US model, we again 
select four topics to include two core migration area topics, one issue traditionally 
owned by liberals and one owned by conservatives. Moreover, the selected topics 
are also similar content-wise to the topics selected for the US case, thus facilitating 
a comparative case study approach. The selected topics are the following ones: 
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Fig. 5 Migration and Migration-related Topics across Time in the Canadian House of Commons 


an immigration type topic, an asylum-process-related topic, a human trafficking 
topic,'? and a security-related topic (see Fig. 5). 

As expected by our hypotheses, the contrast in migration-related topic prevalence 
between the two Canadian political camps appears smaller than between the USA 
political camps across time. There are some periods where trends for the two groups 
of parties diverge, notably with respect to asylum and security, but overall both 
parties display quite similar trends. This contrasts with the American case where 
with the human trafficking topic, for example, the Democrats’ topic prevalence 
trend is completely different from the Republicans’. When there are differences in 
emphasis, the two ideological camps emphasize an issue that they are traditionally 
considered to own (e.g., conservatives with security), just as our hypotheses suggest. 


15 An explanation for the spike in conservative discourse on the Human Trafficking topic around 
2010 is not readily apparent based on our current reading of the relevant literature. Future work 
could explore such anomalies in more detail. 
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Analysis of both US and Canadian topics across time also shows that differences in 
prevalence fluctuate over time. While the interpretation of the reversals surpasses 
the scope of this chapter, the literature suggests that framing strongly depends on 
the external events and the salience of the topic in the public discourse (Lefevere 
et al. 2017). We suggest the fluctuations could be a result of pertinent legislative 
acts, important political events, and majorities in the lower houses of the legislative 
bodies in the USA and Canada. 


4.4 Migration Policy Framing: Word Use 


In the earlier two subsections, we discussed how the two ideological blocks 
quantitatively contribute to the migration topics, first by taking the whole dataset 
into account and then by emphasizing the changes across time. In this subsection, 
we focus on the results from the perspective of how the two ideological blocks 
discuss migration policy. We will analyze the choice of the words liberal as opposed 
to conservative camps uses—providing an in-depth understanding of the framing 
strategies. Figure 6a, b are visualizations of the word choice comparisons we present 
in this section. They represent the types of words from the Human Trafficking topic 
from the United States' and Canadian Speeches, respectively. We provide visual- 
izations for the other topics in the appendix. We first analyze how Republicans and 


(a) US Human Trafficking (7) 


victim 
terrorist 
administr 
muslim 
state z 
islam act violenc women 
protect 
Republican : Democrat 
(b) Canada Human Trafficking (15) 
iss traffick 
human 
intern 
protect 
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smugg : 
right 
conservative  : liberal 


Fig. 6 Word choice comparison plot for Human Trafficking topic. (a) US. (b) Canada 
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Democrats in the US House of Representatives discuss migration-related topics.!° 


With regard to types of immigration, Republicans emphasize “illeg,” “american,” 
and "state" while Democrats emphasize “immigr*” and “family.” Similarly, when 
discussing enforcement of immigration policy, Republicans emphasize “illeg,” 
“alien,” “crime,” “criminal,” and *immigr" whereas Democrats emphasize “local” 
and "offic*". We interpret this as the parties framing policies by leaning into 
migration umbrella issues they own. In this case, Republicans own law enforcement 
and crime. This supports Odmalm's (201 1a) theoretical approach, which suggests 
parties’ debate migration policy in ways which allows them to avoid competences 
of their opponents and focus on their strengths. 

The topic of human rights issues, with a specific focus on women’s rights 
and rights of refugees, is another interesting case. When discussing gender-based 
violence, Democrats use words such as “victim,” “women,” “violence,” “traffick*”’, 
"domes*", “abuse,” as well as “protect,” “right,” and “act.” Republicans emphasize 
“islam,” “terrorist,” “muslim,” as well as “american” and “kill.” In this context, 
Republicans appear to emphasize perceived threats, while Democrats appear to 
be adopting a women’s rights lens—areas which play to their perceived issue 
ownership. When discussing geography with respect to Human Rights, Democrats 
emphasize more cases in which refugees are entitled to protection—using words 
such as “haiti” or “haitian.” Republicans use a vocabulary of Human Rights to 
discuss the countries with authoritarian systems of government, such as Cuba and 
China. Both parties equally emphasize freedom. 

Considering the topic of security, Republicans frame the policy to outline 
their ownership over it—they emphasize "secur*", "terrorist," and “nation,” while 
Democrats speak about the procedural issues related to the border, such as “fund*” 
and "agent*". The topic of threats is discussed within the frame of "border*", 
“patrol,” *illeg*", and “Mexico,” while Democrats talk about “troops” and “mil- 
itari*", in connection to Iraq and Afghanistan. The most striking is the topic 
of international conflicts, where Democrats focus on “peac*’’, while Republicans 
discuss “war,” "militari*", and "Iraq." 

Culture as one of the topics often emphasized as part of the broader migra- 
tion theme is also represented in parliamentary speeches. Both Republicans and 
Democrats emphasize "nation" in this context, but Republicans talk about “great*”’, 
"freedom*", and "serv*", while Democrats emphasize *communiti*", "histori*", 
as well as “asian” and "pacif*". Our interpretation is that Democrats emphasize 
the multicultural aspect of the USA as a country founded by immigrants, while 
Republicans emphasize the role of freedom and its role in the American culture. On 
economic topics, Republicans discuss the budget in relation to taxation and spending 
on healthcare, while the Democrats refer more explicitly to “cuts,” “budget,” and 
"republican," perhaps referring to tax or spending cuts Republicans are introducing. 


39 66 


» 66 


lóSee the online Appendix for a graphical illustration of this point. 
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Now turning to the Canadian context, we analyze how conservatives and liberals 
frame migration-related topics. In Canada, with regard to legal aspects of immi- 
gration, conservatives emphasized: "citizenship" and "canadian*", while liberals 
focused on “immigr*”, "applic*", and “process.” When discussing the security 
subcomponent of migration policy, liberals focused on “secur*” and "border*", 
while the conservatives emphasize "terrorist*". We analyze this framing with our 
theoretical approach in mind—conservatives tend to emphasize issues which they 
own, such as security and protection from terrorism, but compared to the USA, 
there is a weaker distinction between liberals and conservatives with respect to their 
framing of migration discussions. 

As in the USA, the issue of human rights, with a specific focus on human 
trafficking is also outlined in our Canadian model. When discussing this topic, 
conservatives emphasize the connection between “immigr*” and *smuggl*", while 
liberals emphasize “human*”, "traffick*" and "right" (see Fig. 6b). We suggest 
that also here we can identify patterns of framing based on issue ownership. For 
liberals, the focus is on trafficking, victims, and their rights, while conservatives 
dedicate more attention to the crime of smuggling in connection with immigration. 
However, the differences are not as striking as in the USA between Republicans 
and Democrats. When discussing humanitarian aspects of migration, conservatives 
emphasize “syria*”, "syrian*", “help,” "humanitarian," and "assist," while for 
liberals it is more about "refuge*". 

Similar again to the USA, the culture topic that emerges from our Canadian 
model seems to revolve around the multicultural origins of Canada and diversity 
of the origins of its inhabitants. However, there are no clear differences in the 
framing of this subcomponent between the two blocks. Conservatives emphasize 
"canadian" and "histori*", while for liberals the focus is on “community,” “nation,” 
and "great", all of which also occur in the US debates. 

Finally, migration-related economic topics are of great importance in the Cana- 
dian model. When discussing the labor market, conservatives use words such as 
"job", “plan,” and "econom*". For liberals, the emphasis is more on “budget,” 
"invest*", “tax,” “program,” and “need.” Differences in policy framing are perhaps 
clearer when considering the aspect of welfare, where the conservatives focus on 
"tax," “incom*”, and “pay,” while liberals emphasize "famili", “children,” “child,” 
“parent,” and “communiti*”. 

Analysis of the differences between the two ideological blocks in word use 
within topics confirmed our expectations. In the USA, there are clear differences 
in the framing of the migration-related issues. When forced to discuss migration 
policy—over which neither block has clear issue ownership—political elites’ choice 
of words implies policy framing in a way that allows them to emphasize the 
subcomponent they have established ownership over. We find this pattern both in 
the USA and in Canada but the issue ownership differences in word choice are more 
pronounced in the USA than in Canada. In the next section, we briefly summarize 
our findings and conclusions. 
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5 Conclusion 


The chapter explored how ideological camps in the USA and Canada discuss 
migration policy in legislative speeches. We argued that political camps tend to 
avoid discussing migration directly as it represents a political cleavage encompass- 
ing multiple, opposing issues over which lines of ideological ownership are not 
yet clearly drawn. Modeling parliamentary discourse as an area where politicians 
sandbox their ideas, we explored how political camps frame migration policy in 
their speeches. 

Our analysis showed that politicians in the USA and Canadian legislatures 
debate migration policy in a way that allows them to emphasize migration aspects 
over which their parties traditionally hold issue ownership. Specifically, our results 
confirmed that Democrats in the USA and the liberals in Canada, in their migration- 
related legislative speeches, stress issues they own—such as poverty, welfare, 
weapons control, and health. In contrast, US Republicans and Canadian conser- 
vatives instead focus on the external threats, security, and sovereignty aspects of 
migration policy as issues they traditionally own. 

Our results suggested that parties not only emphasize the issues they own 
more, they also frame migration in a way that emphasizes their competences over 
subcomponents of the broader policy area. Republicans and conservatives frame 
subcomponents of migration such as humanitarianism in a way that emphasizes 
issues traditionally owned by their political camp. Democrats and liberals do the 
same—they frame migration policy by focusing the debate on their strengths, even 
when discussing security threats for which their competition can be considered as 
more competent. 

Through our focus on differences in polarization between the case studies, we 
also found evidence that the differences in migration policy framing between the 
two political camps are more pronounced in the USA than in Canada. The chapter 
contributes to the existing migration policy literature by expanding the research on 
legislative speeches in the USA and Canada, two cases which have not yet been 
explored in this context. By showing that political actors tend to emphasize issues 
they own more in a polarized context, we expanded the research on the effects of 
polarization on policy framing in legislative speeches. 

Our exploratory study of migration framing in legislative debates in the USA and 
Canada has also enabled a better understanding of humanitarian subcomponent of 
migration issues. While the humanitarian migration subcomponent is present in the 
public discourse, the existing studies, have neglected it, at least thus far. We suggest 
that future studies should emphasize this aspect of the policy. Particularly, future 
research could examine how it relates to other, already established components of 
the migration debate—such as the economy, security, and culture. 

The chapter represents a methodological contribution by providing a guide on 
how automated text analysis and in particular structural topic modeling can be 
used to study intricate and complex research questions from social sciences— 
such as rhetorical conflicts in framing migration policy in legislative speeches. 
Further extensions of this work should incorporate validation of the findings either 
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through human coding of a part of the selected dataset or through comparison with 
existing qualitative studies. Expanding the time frame, and therefore incorporating 
more international conflicts, may clarify the relationship between conflict abroad 
and policy framing by domestic political elites. We hope this chapter encourages 
more researchers to use quantitative text analysis to explore social science research 
puzzles. 
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Abstract Social norms can facilitate societal coexistence in groups by providing an 
implicitly shared set of expectations and behavioral guidelines. However, different 
social groups can hold different norms, and lacking an overarching normative 
consensus can lead to conflict within and between groups. In this chapter, we 
present an agent-based model that simulates the adoption of norms in two inter- 
acting groups. We explore this phenomenon while varying relative group sizes 
and homophily/heterophily (two features of network structure), and initial group 
norm distributions. Agents update their norm according to an adapted version 
of Granovetter's threshold model, using a uniform distribution of thresholds. We 
study the impact of network structure and initial norm distributions on the process 
of achieving normative consensus and the resulting potential for intragroup and 
intergroup conflict. Our results show that norm change is most likely when norms 
are strongly tied to group membership. Groups end up with the most similar 
norm distributions when networks are heterophilic, with small to middling minority 
groups. High homophilic networks show high potential intergroup conflict and 
low potential intragroup conflict, while the opposite pattern emerges for high 
heterophilic networks. 
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1 Introduction 


In this chapter, we study the impact of network structure and initial group norm 
distributions on the process of arriving at a normative consensus between groups and 
the potential for intragroup and intergroup conflict that might emerge under different 
conditions. To this end, we first provide a brief theoretical overview on social 
norms, normative group conflict, and the process of finding consensus through social 
influence. Second, we give an overview on the role that network structure as well as 
the initial distributions of norms can play in this process. Specifically, we argue that 
homophily/heterophily (preference for forming connections to similar/dissimilar 
others) between members of different groups, relative group sizes, and the initial 
distribution of norms within groups are all important factors for reaching normative 
consensus, and consequently relevant determinants of conflict potential. Based on 
this reasoning, we develop an agent-based model that simulates social networks of 
agents from two different social groups where each agent holds one of the two social 
norms. In an adapted version of Granovetter’s threshold model (Granovetter, 1978), 
each agent updates its social norm by comparing the proportion of norms held by 
its immediate neighbors to an internal threshold drawn from a uniform distribution. 
Agents are thus "observing" the “openly displayed behavior" of their neighbors and 
adapt their own behavior accordingly if enough of their neighbors display a different 
norm. We apply this model to different network structures, defined by relative group 
sizes and homophily/heterophily between agents from different groups. This allows 
us to assess the impact of these structural network properties on the process of 
reaching normative consensus and associated conflict potential. In addition, we run 
our model for different levels of initial group norm distributions, so that we can 
also assess the influence of alignment (or independence) of norms and social group 
membership. We define and examine three relevant outcomes: the degree to which 
norm distributions change, the degree to which the difference in norm distributions 
between the two groups changes, and the potential for conflict within and between 
the groups. Lastly, we discuss our results with respect to their applicability, the 
limitations of our model, and possible directions for future research. 


2 Social Norms 


Social norms can be defined as unwritten behavioral rules (Bicchieri and Mercier, 
2014) or "social standards that are accepted by a substantial proportion of the group" 
(Forsyth, 2018, 145). They are a shared set of situation-specific behaviors that 
facilitate social interaction by providing an implicitly shared set of expectations 
and behavioral guidelines (Bicchieri, 2006). Such behaviors can range from an 
implicit dress code at work, to the expression of religious and political symbols, or 
(not) interacting with other social groups. Norms are implicitly negotiated between 
members of a group and enforced through informal sanctions, such as gossip, 
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censoring, or ostracism (Bicchieri, 2006). They are passed through generations via 
socialization processes in childhood (House, 2018) and are, in contrast to laws, 
not necessarily enforced by an institution. Norms come in multiple types, for 
example, prescriptive norms define behaviors that one should enact (e.g., "offering 
elderly people a seat on the subway"), while proscriptive norms define undesirable 
behaviors that one should avoid (e.g., "interrupting people while they speak"). The 
most important distinction for our purposes is between injunctive and descriptive 
norms. Injunctive norms focus on beliefs about how people should act, while 
descriptive norms are defined by the observation of how people actually do act 
(Melnyk et al., 2010; Cialdini et al., 1990). For instance, “everybody should recycle" 
is an injunctive norm, while the observation that many people do not recycle 
represents a descriptive norm (Cialdini et al., 1990). Both types of norms are 
important determinants of behavior, but previous research suggests that injunctive 
norms primarily elicit behavioral change by changing attitudes (Melnyk et al., 2010; 
Megens and Weerman, 2010), while descriptive norms directly impact behavior 
(Cialdini, 2007). In this chapter, we are interested in descriptive norms, because 
they are directly inferred from the observed behavior of others. Injunctive norms 
can differ from directly observed behavior, and can involve more complex cognitive 
processes (House, 2018), which are beyond the scope of our model. Therefore, 
when we are referring to social norms with respect to our model, we are specifically 
addressing descriptive social norms. 


2.1 Normative Conflict 


A large body of previous research has focused on the potential for positive impact of 
social norms on behavior. Predominantly, these studies were interested in changing 
individual beliefs or behavior by presenting normative information at odds with 
the individual's current beliefs or behavior. Examples include the reinforcement 
of non-delinquent behavior through the influence of peers (Megens and Weerman, 
2010), positive effects of punishment on cooperative behavior (Fehr and Gáchter, 
2000), effects of social norms on compliance to vaccination programs (Oraby et al., 
2014), reduction of binge-drinking in college students (Haines and Spear, 1996), 
and littering (Cialdini et al., 1990). However, inconsistent norms do not only elicit 
behavioral change; they can lead to interpersonal and intergroup conflict (Hogg 
and Reid, 2006). The potential risk of such normative conflicts is especially high 
in multicultural contexts where different cultural groups must coexist (Wimmer, 
2013). A recent example of normative conflict in Europe is women wearing a 
veil to cover their face in public. This practice is a prescriptive social norm in 
some predominantly Muslim countries and it has elicited mixed reactions when 
immigrants engaged in the practice in their new countries (Kılıç et al., 2008). 
Some Western countries such as France, Belgium, and Switzerland have banned 
this practice. In France, lawmakers claimed that a ban was necessary to ensure 
"peaceful cohabitation" (Zeit Online, 2019). Likewise, in Germany, face veils have 
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been controversially discussed in the past years: For instance, the German Minister 
of the Interior stated "[... ] we reject this. Not just the headscarf, any full-face veils 
that only shows eyes of a person [...] It does not fit into our society for us, for 
our communication, for our cohesion in the society ...This is why we demand 
you show your face" (McKenzie, 2019). This backlash reflects an underlying 
normative conflict, with a large majority (8196) of Germans supporting a ban in 
public institutions and a substantial group (5196) even supporting a general ban. 
Only a minority of the national population (15%) indicate that they are not in favor 
of any kind of regulation (Infratest Dimap, 2018). 

However, such normative societal conflicts exist not only along established 
cultural and religious divides, but can cover a wide array of topics and elicit 
intergroup and intragroup conflicts (Hogg and Reid, 2006). For instance, gun 
ownership is a controversial normative debate within US society (Kleck, 1996), 
involving subgroups with different cultural orientations (Celinska, 2007). Abortion 
is another topic debated worldwide, with disagreements concerning women's rights, 
health care systems, and moral constraints (Marecek et al, 2017). Empirical 
research shows how the controversy around abortion leads to a polarization of 
opinions within Protestants and Catholic groups in US society (Evans, 2002). Other 
inconsistent norms can concern controversial national traditions such as Zwarte Piet 
(“Black Pete"), a folklorist character and helper of Sinterklaas (Santa Claus) in the 
Dutch culture. The character is typically displayed with blackface makeup, bright 
red lips, and colorful clothing. The display has been increasingly criticized as a 
racist stereotype, predominantly by minority and immigrant groups, while many 
native Dutch citizens argue that “Black Pete" is a positive character and part of their 
national tradition (Rodenberg and Wagenaar, 2016). In essence, inconsistent social 
norms within a larger collective have the potential to lead to intergroup, as well as 
intragroup conflict. With respect to trends of increasing globalization and migration, 
effectively resolving these normative conflicts is becoming a striking priority for 
many societies in the future. 


2.2 Finding Consensus 


Despite their potential for negative outcomes, normative conflicts are not an 
indication that a collective is inherently unfit to live together peacefully. In contrast, 
they can be fundamental to the formation of social units at different scales. Georg 
Simmel defines shared consensus on social roles and their supporting norms as 
necessary features of human society (Simmel, 2009). Similarly, normative conflicts 
are frequently observed in the literature on group formation and described as a 
necessary step towards a common group identity. For example, in Tuckman's stage 
model of group development, the norming stage focuses on resolving disagreement 
and establishing a shared set of behavioral guidelines; it is a crucial step in 
the formation of an effective group (Tuckman, 1965). Some recent, empirically 
validated models such as the Normative Conflict Model (Packer and Miners, 2014) 
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confirm this mechanism. According to the model, members strongly identified with 
the group are more likely to openly express dissent compared to weakly identified 
members (Packer and Miners, 2014). Dissenters help uncover the causes of the 
conflict and discuss possible solutions. To form an effective group with committed 
members, it is necessary to effectively resolve conflicts due to incompatible norms 
and to find a consensus on which most members agree. Failure to reach such a 
consensus might result in a lack of common group identity and task effectiveness, 
leading to the dissolution of the group (Tuckman, 1965). 

Interactions between people from different social groups are a steadily increasing 
occurrence in societies that are socially, economically, and culturally diverse 
(Arapoglou, 2012). Such diversity is likely to increase in the future, along with 
changing relations between majority and minority groups due to demographic and 
socioeconomic changes (Crul, 2016). As ongoing political and societal polarization 
in Western societies already demonstrate, incompatible social norms associated with 
different groups have the potential to elicit conflict (Fiorina and Abrams, 2008). For 
these reasons, we argue that it is crucial to understand the conditions enabling social 
groups to effectively reach a normative consensus and how this process relates to 
conflict potential within and between social groups. 


3 Network Structure and Group Norm Distributions 


Individuals do not adopt norms in isolation; the structure of their social environment 
is a key determinant of social behavior. The social networks in which we are 
embedded determine the kinds of people and behavior to which we are exposed, 
thereby shaping the descriptive norms we hold. Thus, the interpersonal processes 
which contribute to finding normative consensus (Neumann, 2008), as well as 
the intergroup and intragroup processes (Hogg and Reid, 2006), are crucially 
contextualized within networks of social interaction. Consequently, we argue that 
finding normative consensus is a continuous process of group members mutually 
exerting social influence (Cialdini, 2007) on each other until a relatively stable 
equilibrium is reached (Latané, 1981; Flache et al., 2017). This often requires that 
at least some individuals react to social influence exerted on them by their social 
networks by changing their norms. For instance, Kalesan et al. (2016) show how 
networks such as family and friends are the best predictors in forging a culture 
favoring gun ownership. As for the normative conflict of gay marriage in the USA, 
a longitudinal time-series study shows how the decision of the US Supreme Court 
in June 2015 eventually led to an increase in perceived social norms supporting 
gay marriage independently of individual attitudes (Tankard and Paluck, 2017). In 
short, the social networks people are embedded in appear to play a crucial role in 
the process of reaching a normative consensus within and between groups. 

In this chapter, we will focus on homophily/heterophily between people from 
different groups and relative group sizes as determinants of network structure, and 
on the initial distribution of norms within groups when they come into contact. 
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3.1 Homophily and Heterophily 


Homophily is the tendency to preferentially connect and interact with similar 
others (McPherson et al., 2001), while heterophily is the tendency to preferentially 
connect and interact with dissimilar others (Lozares et al., 2014). Homophily has 
been observed extensively in many social networks, including school friendships 
(Stehlé et al., 2013), scientific collaborations (Jadidi et al., 2017), and online 
communications (Mislove et al., 2010). It is likely a manifestation of the similarity 
bias, a fundamental human tendency to like and value others that are similar to the 
self and to consequently be disproportionally influenced by them (Cialdini, 2007). 
For example, a controlled experimental study on the spread of a health innovation 
through social networks varied the level of homophily, showing that homophily 
significantly increased the overall adoption of new health behavior, especially 
among those in more clustered networks (Centola, 2010). Similar effects have been 
shown in diverse health behaviors in large social networks, such as the spread of 
smoking (Christakis and Fowler, 2008) and obesity (Christakis and Fowler, 2007). 
Since social influence is exerted through social ties in networks (Aral and Walker, 
2011; Lewis et al., 2012) and homophily/heterophily determines how these ties 
are formed, we argue that it is an important factor in the process of negotiating a 
normative consensus through mutual social influence. 


3.2 Group Size 


Almost no collective group is made out of completely homogeneous members. 
Instead, they consist of demographic subgroups, such as those defined by gender, 
nationality, or education (McPherson et al., 2001). Mostly, these subgroups are 
not equally sized, so that people are either part of a majority or minority group 
(Blau, 1977) with respect to a certain social category. The pervasive influence of 
majority opinions, customs, and norms is well established in theoretical accounts 
of group-based social influence (Latané, 1981). The dominant role of the majority 
has been experimentally validated in numerous studies replicating the seminal 
work by Asch (1951), both for individual social influence (Horcajo et al., 2010; 
Kundu and Cummins, 2013) and group influence (Meyers et al., 2000; Cohen, 
2003). Greater influence of the majority is generally assumed for acculturation 
processes of minority immigrants in host countries (Bourhis et al., 1997; Ward 
et al., 2010). Yet, other studies have demonstrated that under certain conditions, 
minorities can successfully exert social influence on the majority and consequently 
redefine the normative consensus in their favor (Hogg and Reid, 2006; Mugny and 
Papastamou, 1982; Nemeth, 1986). For these reasons, we argue that the sizes of 
interacting subgroups within a larger society are an important factor in the process 
of negotiating normative consensus. 
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3.3 Initial Group Norm Distributions 


Agreement on social norms is considered to be a part of the collective identity people 
derive from the social groups to which they belong (Hornsey, 2008; Hogg and Reid, 
2006). Norms vary, however, in how much they align with group membership. 
Even in the case of German opinions on face veils, a full 15% do not agree 
with the normative opinion to ban face veils (McKenzie, 2019; Infratest Dimap, 
2018). That is, despite sharing group membership, individuals disagree on this 
norm. Conversely, in the social group of Muslim immigrants in Germany, some 
will support the norm of face veils, while others will oppose it. People can hold 
the same norm on face veils even though they are from different social groups, 
or they can hold different social norms while belonging to the same social group. 
In terms of our example, there will be some Muslim immigrants agreeing with 
Germans who oppose face veils. There will also be some Germans agreeing with 
the Muslim immigrants who do not oppose face veils. In short, even in this case of 
strong consensus, group membership is not the single determinant of norms held 
on an individual level. Social norms are often aligned with group membership to a 
degree, but the two are not synonymous. 

This interplay of social group membership versus agreement in moral or 
normative issues has been shown to be influential in previous studies. For instance, 
the influence that a group exerts on individuals is not only a function of its size, 
but also of its unanimity, with stronger pressure towards conformity for more 
unanimous groups (Asch, 1956). Furthermore, studies have shown that people react 
more negatively to dissenters from their own in-group (Marques et al., 1988) and 
consequently punish them harder. The initial distribution of norms within groups 
thus seems to be important for negotiating a normative consensus, even though it is 
not necessarily influencing the structure of the social network. 


4 Agent-Based Model 


Agent-based modeling can be of particular interest to understand social phenomena 
because it enables researchers to study complex macro-level outcomes that emerge 
from a clearly defined set of micro-level processes (Macy and Willer, 2002; Flache 
et al., 2017). In addition, simulations allow us to systematically vary agents' 
behavioral rules or the circumstances in which they act (Squazzoni et al., 2014). 
In short, agent-based models help us to gain insight into the emergence of complex 
systems by systematically testing a variety of different parameters and the combined 
impact they exert on the emergent system (Macy and Willer, 2002). Previous 
research has extensively used agent-based models to study phenomena such as 
spatial segregation (Schelling, 1971), opinion diffusion (Lorenz, 2007), the adoption 
of innovation (Zhang and Vorobeychik, 2017), and cascade effects (Watts, 2002). 
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For the purpose of modeling normative conflict in social networks with respect 
to relative group sizes, homophily/heterophily, and group norm differences, we 
developed a modular simulation framework based on a network generation algo- 
rithm using preferential attachment, group size and homophily/heterophily (Karimi 
et al., 2018), and Granovetter's threshold model (Granovetter, 1978). We utilized 
R (R Core Team, 2019) for our model as it appears to be more widespread among 
the social science community than Python and offers more customizability, better 
parallelization, and scalability than NetLogo. Consequently, probabilistic processes 
in our model are implemented using the sample() function in R, which relies on 
the current system time to generate a seed for pseudo-random number generation. 
All code, documentation, and an animated visualization are available on GitHub 
(Kohne, 2019) under the MIT License. 


4.1 Simulating Norm Conflict 


In our agent-based model, we aim to simulate the impact of group size, 
homophily/heterophily between agents from different groups, and initial group 
norm distributions on the process of reaching normative consensus and resulting 
conflict potential. To this end, we generated networks with 2000 agents each, where 
network structure is determined by one parameter for relative group size (g) and 
one parameter for homophilic/heterophilic preferences of agents (h) (Karimi et al., 
2018). In addition, initial norms for agents were assigned based on three different 
pairs of binomial probabilities, resulting in three conditions for initial group norm 
distributions. Once the network structure is generated and agents are assigned 
their initial norms, each agent is assigned a threshold from a uniform distribution 
(Granovetter, 1978) and the model simulates normative social influence processes 
between agents by repeating 50 iterations of Granovetter's threshold model. Once 
the simulation is complete, we extract the percentage of agents holding each norm 
for each group, and the number of ties between agents within each group and 
between the groups. Crucially, we differentiate ties between agents holding the 
same norm and ties between agents with incompatible norms. Our model thus 
consists of four subsequent steps: generation of network structure, initialization of 
group norm distributions, the norm updating process, and the extraction of outcome 
metrics. 

In total, we simulate 150 unique parameter combinations with 20 networks per 
combination, resulting in 3000 unique networks (for an overview of the parameter 
space, see Table 1). For each of these networks, we are saving each iteration 
of Granovetter's threshold model as an individual network object, resulting in 
150,000 networks with 2000 agents each. Simulation was carried out on the High 
Performance Computing Cluster of the University of Cologne on 150 MPI nodes. 
We opted for 50 iterations of Granovetter's threshold model because it was the 
highest number of feasible iterations in the maximum computation time limit for 
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Table 1 Range of parameter values of the simulation in the experiment 


Parameter Description Value(s) 

n No. of agents in network 2000 

m Minimum agent degree 2 

pip Initial group norm distribution [0.5:0.5][0.6:0.4][0.8:0.2] 
t Individual agent threshold U(0,1) ^ 

g Group size [0.1, 19-1, 0.5] 

h Homophily/heterophily parameter [0.1, £01, 1] 


*Each of the three conditions compares different initial distribution of the majority norm in the 
majority group (pi) and in the minority group (p2) 
>U:Uniform distribution 


the MPI nodes (360 h) of the High Performance Computing Cluster. The simulation 
took approximately 13 days (315 h) and resulted in approximately 40 GB of output 
data. 


4.2 Generation of Network Structure 


To generate different network structures that resemble real social networks and 
enable comparison of effects of g and h, we implemented the network generation 
algorithm by Karimi et al. (2018). This algorithm combines the preferential 
attachment mechanism, which has been observed in many large-scale social 
networks (Barabási and Réka, 1999), with tunable parameters for group sizes 
and homophilic/heterophilic tendencies of agents in the model. As a point of 
terminology, we will refer to the group containing more agents as the "majority 
group” and the group containing less agents as the “minority group.” 

The network generation model implements an iterative growth process where we 
start out with a small number of m initial agents for both the majority group and the 
minority group. After this initial setting, one agent is added to the network at a time. 
Each new agent has a probability of g to be assigned to the minority group and a 
probability of 1 — g to be assigned to majority group. For example, with a value of 
g = 0.4, each new agent has a probability of 40% to be assigned to the minority 
group and a probability of 60% to be assigned to the majority group. Each new 
agent forms m ties to the agents that are already present from previous steps. In this 
way, the parameter m also defines the minimum degree of agents in the network. We 
keep this parameter constant at m = 2 across all our generated networks because it 
ensures that no agent is isolated in the network. Previous research demonstrated that 
the choice of m does not change the properties of the network (Barabási and Réka, 
1599). 

Connecting these m ties from the new agent to existing agents is probabilistic, 
and relies on the homophily parameter h and the degree of the present agents 
(Karimi et al., 2018). The parameter h ranges from 0 to 1 and defines the likelihood 
of agents to form ties to agents from the same group or from a different group (1—/). 
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A value of 0 represents perfect heterophily (ties will only be formed between agents 
assigned to different groups) and a value of 1 represents perfect homophily (ties 
will only be formed between agents assigned to the same group). In addition, agents 
also have a build-in preference for agents with high degree (preferential attachment), 
which is interacting with their group preference determined by h. Specifically, the 
probability pj; of each added agent j to form a tie with a present agent i depends on 
the degree of the present agent (k;) and the specified homophily parameter between 
i and j, hij, divided by the sum over all existing agents denoted by (J): 


nc hj j ki (1) 
1] — * 
2 hk 

The processes of assigning agents to a group and selecting present agents to 
connect with are not deterministic, so the same set of initial parameters will generate 
slightly different network structures each time. To capture this variance, we generate 
20 networks per parameter combination and report averaged results. See Fig. 1 for 
an example, and see appendix for analytical derivations. 


4.3 Initialization of Group Norm Distributions 


After creating network structures based on the parameters g and h, we initialize 
a norm as an attribute in each agent. We will use “majority norm" and “minority 
norm" when we discuss our results with respect to the two different norms in 
our model. Specifically, majority norm will refer to the norm held by the larger 
proportion of agents in the larger of the two groups after initializing the network 
structure. In cases where the amount of agents holding each norm is equal, we 
simply track one of the two norms over the course of the simulation. 


. 5 E 
i ü RÀ una of 
210 e n P f] 
nU Pay [p Ba ale n feos ctl AH 
E 5 
M NL E e^ DS. sud 
oy Pu MN a FS LI zz cle A TR B 
a 
oo @ o et R2] e LP n 9 d ^ 
a 
a RUN ^ td TH Son a dj e m 
Li m! D ta e |^ ero * De a 
n T Se s d selon , / E © X " 
e EX LP icc 4—uü u a y o 
LJ ü e* c 
j vap , Sit : . V9 J 
d © o Y E . 
a oy án LJ 
d 


Fig. 1 Generated networks with 100 agents and g = 0.2. From left to right, the networks are 
showcasing h = 0.2, h = 0.5, and h = 0.8. Node size represents logarithmized agent degree. 
Minority group agents (20%) are represented by black circles, majority group agents (80%) are 
represented by white squares. When the network is heterophilic (left), the minority group increases 
their degree rapidly due to the combination of preferential attachment and smaller group size. In 
the homophilic network (right), the minority group cannot grow their degree by attracting majority 
group agents 
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We use a probabilistic process with two different parameters p; and p» for the 
initial group norm distributions, where p; describes the probability of agents in 
the majority group to be assigned the majority norm, while 1 — p; describes the 
probability of agents in the majority group to be assigned to the minority norm. 
Vice versa, p» describes the probability of agents in the minority to be assigned 
the majority norm, while 1 — p» describes the probability of agents in the minority 
being assigned the minority norm. For example, with p; = 0.7 and p2 = 0.3, 
each agent in the majority group has 70% probability of being assigned the majority 
norm and probability 30% of being assigned the minority norm. Conversely, each 
new agent assigned to the minority has a probability of 3096 to be assigned the 
majority norm and probability of 7096 to be assigned the minority norm. In this 
example, we can see that p; and p» define how closely the assignment of norms 
is related to the group membership of new agents. If p; and p» are both 0.5, then 
there is no connection between group membership and norm—every agent of either 
group has an equal probability (50%) to endorse either norm. If pı is large and p» 
is small, then initial norm proportions are associated with group membership—the 
majority and the minority group preferentially use different norms. In our model, 
we will be testing one case where the initial norm distribution is unrelated to group 
membership (p; = 0.5 and p2 = 0.5), one where the initial norm distribution is 
weakly related to group membership (p; = 0.6 and p2 = 0.4) and one where 
initial norm distribution is strongly related to group membership (p; — 0.8 and 
p2 = 0.2). We thus generate models where (a) 50% of the majority group and 50% 
of the minority group start with the majority norm, (b) 6096 of the majority group 
and 4096 of the minority group start with the majority norm, and (c) 80% of the 
majority group and 20% of the minority group start with the majority norm. 


4.4 Norm Updating Process 


After initializing one of the two norms in each agent according to parameters pj 
and p2, we simulate the adoption of norms over time within each network using 
Granovetter's threshold model (Lewis et al., 2012; Aral and Walker, 2011; DiMag- 
gio and Garip, 2012). In our simulation, we use a modified version (Granovetter, 
1978) where each agent in the model is assigned a threshold value from a uniform 
distribution [0,1]. A central point in Granovetter's threshold model is the variability 
of thresholds within a group. Once people with lower thresholds adopt a norm, 
they will raise the proportion of people with that norm, increasing the chance of 
shifting those who have higher thresholds (Granovetter, 1978). In his seminal work, 
(Granovetter, 1978) showed these dynamics both with a uniform distribution and 
a normal distribution of thresholds. In our model, we decided to use a uniform 
distribution of thresholds because our aim is to understand the role of network 
structure and initial norm distributions in normative conflict, and not primarily to 


124 J. Kohne et al. 


investigate the effects of the threshold. To clearly understand emergent properties 
in agent-based models without extraneous mechanisms, it is beneficial to avoid 
unnecessary complexities (Eberlen et al., 2017). Non-uniform distributions require 
particular choices: either the single value of the threshold held by all agents or 
the mean and variance of a normally distributed threshold parameter. Thus, any 
distribution besides the uniform requires additional assumptions without adding 
a concrete contribution (Railsback and Grimm, 2019) to our research questions. 
We use a uniform distribution in our model to control the effect of the threshold 
distribution (Lee et al., 2015) while testing the effect of network structure and 
initial norm distribution. We also allow agents to change back and forth between 
norms as appropriate given their threshold and the norms of their neighbors. This is 
distinct from some models where an agent can only change once (e.g., learning of 
a new innovation), and we consider it appropriate for modeling our phenomenon of 
interest—descriptive social norms. 

In the updating process, each agent compares its threshold value to the proportion 
of its immediate neighbors holding a particular norm. If the proportion of neighbors 
that are expressing a given norm is equal to or higher than the agent's threshold, the 
agent will update its currently held norm. For example, if agent j has a threshold 
of t; = 0.6, it will update to the norm that 60% or more of its neighbors display. 
Depending on the current norm of the agent, this can mean either switching to a 
different norm or keeping the agent's current norm. If both proportions fail to reach 
the threshold (e.g., 50/50 distribution of norms in neighborhood of agent while the 
threshold value is 0.6), the agent will also keep the current norm. In cases where 
Observed proportions of both norms are equal and exceeding an agents threshold, 
the agent will choose one of the two norms at random. Each network goes through 
50 iterations of the updating process, so all agents update their norms 50 times. 

In each iteration of the norm updating process, all agents are updated asyn- 
chronously, meaning that only one agent is updated at a time and the order in which 
agents update their norms is randomly shuffled before each iteration of the updating 
process. Thus, each agent's updating process can affect the updating process of the 
next agent. We chose this procedure as opposed to having a fixed order for updating 
agents or updating all agents at the same time because natural social interactions 
neither occur in a predetermined order nor do all people in a social network exert 
influence on each other simultaneously. For this reason, we argue that our approach 
more closely resembles real-life interactions and social influence processes between 
people. 


4.5 Outcome Metrics 


After the agent-based model finishes, we extract our outcomes of interest: The 
degree to which norm distributions change, the degree to which the difference in 
norm distributions between the two groups changes, and the potential for conflict 
within and between the groups. 
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To operationalize the degree to which norm distributions change, the initial 
proportion of agents holding the majority norm is subtracted from the final 
proportion of agents holding the majority norm. We subsequently call this Change 
in Majority Norm because it expresses the degree to which the group has adopted 
the majority norm relative to the group's starting point. If this number is positive, 
the group's use of the majority norm has increased over the course of the simulation. 
For example, if the network starts with an 80—20 group norm distribution and ends 
with 60% of the minority group endorsing the majority norm, the minority group 
has adopted the majority norm by 40%. If change in majority norm is negative, the 
group has rejected the majority norm. In a similar example, if the network starts 
with an 80-20 group norm distribution and ends with 10% of the minority group 
endorsing the majority norm, the minority group has rejected the majority norm by 
1096. This is a group-level outcome: it tells us how the normative consensus within 
the majority group and within the minority group have changed over time. It is 
worth noting that the initial norm distribution limits the possible change within the 
majority group and the minority group. In the 80-20 initial norm distribution, only 
20% more of the majority group could hold the majority norm, while 80% more of 
the minority group could do so. 

At a system level, we are interested in the degree to which the difference in 
norm distributions between the two groups changes. Specifically, we are interested 
in whether the two groups express the two norms in similar proportions after the 
last iteration and if they have become more similar in their norm proportions 
over time. To calculate this, we first calculate the initial group norm difference 
by subtracting the initial proportion of the minority group holding the majority 
norm from the initial proportion of the majority group holding the majority norm 
A(P) initial = Plon P5632 (see Sect. 4.3). Then we calculate the final group 
norm difference by subtracting the final proportion of the minority group holding the 
majority norm from the final proportion of the majority group holding the majority 
norm, A(P) fing = P1 sinat T Pu We subtract the final group norm difference 
from the initial group norm difference to define Change in Group Norm Difference 
A(P) finat — A(P)initiai: 1f this is positive, then difference has increased; the groups 
have become less similar over the course of the simulation in terms of their norms. 
If this is negative, then the group norm difference has decreased; the groups have 
become more similar. Once again, it is worth noting that the initial group norm 
distribution limits total possible change. 

At a dyadic level, we are interested in the potential for interpersonal conflict 
between and within groups. To look at this, we define Conflict Ties as ties connecting 
two agents with different norms after the last iteration. Crucially, we distinguish 
between conflict ties of agents from the same group as a proxy for potential 
intragroup conflict and conflict ties of agents from different groups as a proxy for 
potential intergroup conflict. In particular, we are extracting the proportion of ties 
in the majority group that connect agents with inconsistent norms, the proportion 
of ties in the minority group that connect agents with inconsistent norms, and the 
proportion of ties between the groups that connect agents with inconsistent norms. 
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5 Simulation Results 


Our results are structured around the three overarching outcome metrics outlined 
above. For each metric, we consider the aggregated output of our runs by averaging 
over the values obtained from the 20 simulated networks per parameters combina- 
tion. 


1. Change in Majority Norm: Which combinations of parameters increase or 
decrease the prevalence of the majority norm? In which cases does the majority 
norm become prevalent among the majority and the minority group? In which 
cases does the minority norm gain prevalence? 

2. Change in Group Norm Difference: Which combinations of parameters reduce 
between-group norm differences? Which make convergence of norms most 
likely? 

3. Conflict Ties: Which sets of parameters make it most likely that potential within- 
group or between-group conflict will emerge? Which make it most likely that 
there will be little potential for conflict? 


5.1 Change in Majority Norm 


Our first interest is how the representation of norms within groups changes, using 
our Change in Majority Norm metric (see Sect. 4.5). Figure 2 displays the results 
of the simulations, showing how this is influenced by homophily/heterophily, group 
sizes, and initial group norm distributions. 

This visualization highlights several findings. First, the effect of homophily and 
group size on the results is clearest when the initial group norm distribution is 80— 
20. That is, when norms are highly aligned with group membership, the influence of 
network structure is most pronounced. When the initial norm distributions are 50—50 
in each group, the change in norm proportion is random—the system is not changing 
systematically even with varying levels of group sizes and homophily/heterophily. 
Second, the pattern of results for majority and minority groups are distinct. In the 
majority group, high heterophily (i.e., a greater proportion of connections to the 
minority) leads to stronger adoption of the minority norm. Similarly, as the size 
of the minority group increases, the majority group is more likely to adopt the 
minority group norm. Within Granovetter's threshold model, this is very reasonable: 
increased minority group size makes it more likely for a majority group member to 
be connected to members of the minority group and take on the minority norm. 

The minority group adopts the majority norm most when homophily is middling 
and the minority group is small. The minority group maintains or increases its own 
norm most when it is relatively large, or when homophily is very high or very low. 
This suggests the operation of multiple mechanisms at different intersections of 
homophily and group size (see analytical derivations in the appendix). When the 
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Change in Majority Norm 
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Fig. 2 Change in Majority Norm for majority and minority group. This set of heatmaps displays 
the influence of network homophily/heterophily, group size, and initial norm distributions on 
change in the majority norm. Each square represents the degree to which representation of the 
majority norm has increased or decreased in each group. Darker blue means shift towards the 
majority norm and darker orange means shift towards the minority norm. When norms are initially 
distributed equally (50-50, top row), the change in group norm difference is essentially random and 
does not depend on the properties of network structure and group size. When norms are initially 
distributed unequally, e.g., 80—20, bottom row, we observe the impact of homophily and group 
size. For small homophily values, majority members are more likely to change their norm to the 
minority norm. As homophily increases, the majority and the minority are both likely to adopt 
the majority norm (until h = 1, when the pattern is reversed). In general, as the minority group 
increases in size, it is more likely to retain its own norm and influence the majority 


network is highly homophilic, the minority maintains its own norm because it is 
selectively attached to members of its own group, thereby avoiding exposure to 
majority-group influence. When the network is heterophilic and the minority group 
is small, the minority is also more able to maintain its own norm. This is because 
this network parameterization results in minority group members becoming hubs: 
each majority group member connects to minority group members, and there are 
not many of them. This means each minority group agent has disproportionate 
influence. With a large minority group, minority agents have a higher likelihood 
to be attached to other minority agents, again making it more likely that they will 
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maintain their own norm distribution. The results of the simulation are in agreement 
with our analytical results provided in the appendix. 


5.2 Change in Group Norm Difference 


Our second point of interest is the degree to which the two groups become more 
similar in their group norm distributions. To address this, we use our Change in 
Group Norm Difference metric (see Sect. 4.5). The more negative this number, the 
more similar the groups have become in their norm distributions; the more positive, 
the more the groups have diverged in their norm distributions. Figure 3 displays the 
results of the simulations. 
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Fig. 3 Change in Group Norm Difference. The more negative this number, the more similar the 
groups have become in their norm distributions; the more positive, the more the groups have 
diverged in their norm distributions. In the 50—50 distribution condition, we see that there is no 
systematic effect of the two groups becoming more normatively similar. In the 80—20 distribution, 
the network structure results in strong mutual conformity unless homophily and/or minority group 
size is very high 


Network Structure, Group Norm Distribution, and Norm Conflict 129 


As with the change in norm proportions, the effects of homophily and group 
proportion are clearest when the initial group norm distribution is strongly associ- 
ated with group membership (i.e., 80—20 initial group norm distribution). In this 
case, we can see there is a strong pattern of the two groups moving towards similar 
norm distributions (i.e., reduce their differences). This pattern is less pronounced or 
reversed as homophily increases and minority group size increases. This suggests 
that heterophily is important for producing between-group norm similarity, while 
high homophily may actually increase between-group norm difference. 


5.3 Conflict Ties 


Our third question revolves around the remaining potential for normative conflict, 
once the simulation has run. For this, we look at the proportion of within- and 
between-group ties that are Conflict Ties at the end of the simulation. Figure 4 
shows the results of our simulation for proportion of within-group and between- 
group ties that are conflict ties. We display results from the 80—20 initial group 
norm distribution, where group membership and initial norm distribution are closely 
connected. As with the prior analyses, the results of the 50—50 initial norm 
distribution were essentially random, and the pattern in the 60—40 initial norm 
distribution is similar to the 80—20 case but not as strong. 

Comparing the three graphs in Fig. 4, we see that the level of network homophily 
determines the trade-off between intergroup and intragroup conflict. In high- 
homophily networks high potential for intergroup conflict remains at the end of 
the simulation, but there is little potential for intragroup conflict. In contrast, high- 
heterophily networks have very little remaining potential for intergroup conflict, but 
slightly higher potential for intragroup conflict. 

The role of minority group size also emerges clearly in Fig. 4. For between- 
group ties and majority-group ties, having a small minority group reduces potential 
conflict. This effect is relatively consistent across all the levels of homophily, though 
itis more exaggerated at more extreme ones. Within the minority group, group size 
does not appear to have as consistent of an effect on conflict ties. 


6 Discussion and Conclusion 


We see three important strands in our pattern of results. First, they speak to 
the degree to which the alignment of initial group norm distributions and group 
membership is crucial for the process of reaching normative consensus. Second, 
they point towards the impact of homophily and heterophily in balancing between 
in-group and out-group conflict. Finally, they point towards strategies that could 
be used to maintain minority norms in minority groups and to avoid large-scale 
assimilation. 
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Fig. 4 Final Proportion of Conflict Ties in 80-20 Initial Norm Distribution. We see that highly 
homophilic networks still have relatively high potential for between-group conflict (top row). In 
contrast, when there is low homophily, the between-group conflict decreases. A reverse pattern 
appears for within-group conflict ties (second and third rows). As homophily increases within- 
group conflict decreases 


6.1 The Alignment of Norms and Group Membership 


One clear result of our simulation is that, in a system with conflicting norms, 
substantive change occurs only when the norm is highly aligned with group 
membership. In our model, this took the form of an 80—20 initial norm distribution, 
where 80% of the majority group but only 20% of the minority group initially held 
the majority norm. In cases where the norm was not aligned with group membership 
(50-50 initial norm distribution, top row of Figs. 2 and 3), we do not observe any 
clear globally dominant norm at the end of the simulation. Even in cases with a 
relatively large majority group (minority group only 10% of the network), there 
was no particular norm change because the social influence of the majority group 
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was evenly split between two norms. When the norm is moderately aligned with 
group membership (60—40 initial norm distribution), we see intermediate results— 
not entirely random as with the 50—50, but less clear than when the norm is strongly 
associated with group membership. 

In intergroup situations, we see that group-level and system-level influence arises 
not out of small pockets of extremely strong beliefs (i.e., the small minority group 
in an 80—20 initial norm distribution), but rather out of the consistent homogeneous 
norm of a majority group. There are cases of normative disagreement that take 
on proportions like this—our headscarf example from the beginning, for instance, 
showed 8196 of Germans in favor of banning the headscarf in public institutions, 
with only 15% contradicting that opinion. Though such distinct norms are likely to 
be newsworthy, perhaps there are many instances of intergroup norm non-conflict 
that receive less attention. Newspapers are unlikely to report that two neighbors 
from different cultural backgrounds both like to eat dinner with their families, but it 
may be important for collective cohesion nonetheless. 

This also supports prior literature suggesting that groups with consensual norms 
are most likely to prompt normative change in their outgroup. A recent survey in 
the USA, for instance, indicated a 50—50 split on whether football players should 
be required to stand during the national anthem (Vandermaas-Peeler et al., 2018). 
In this case, Americans as a single majority group are unlikely to exert much 
normative force on out-group members (e.g., Canadians) about this issue. If we 
consider subgroups of Americans (i.e., Republicans and Democrats), this norm may 
be much more strongly associated with group membership and thus more likely to 
have an effect. 

Our model focuses on the shift in a specific norm within a network. This fits our 
interest in descriptive norms, though real cultural practices might be whole clusters 
of normative behaviors rather than single binary norms. A contrast between Jewish 
and Christian people, for example, is not only that they attend different religious 
services, but also that they can have distinct injunctive norms around weekend hours, 
food, and marriage that are culturally transmitted. One option is to consider the norm 
in our model as an aggregate, i.e., not a single behavior, but a cluster of group-based 
behaviors. Another option is to consider the norm in our model to be the behavior 
which people notice within a specific context. 


6.2 Homophily Balances In-Group and Between-Group 
Conflict 


One of the primary aims of this model was to understand when and how subgroups 
would conform to each other. In Fig.3, we see that between-group differences 
in norms are clearly reduced by the norm updating process, particularly when 
norms are strongly associated with group membership (i.e., 80-20 initial norm 
distribution). Except in cases of large minority groups or extremely homophilic 
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networks, there is a meaningful reduction in between-group norm differences: the 
groups become more similar as the individual agents change their norms. Looking at 
Fig.2, itis clear that most of the norm change happens in the minority group—they 
tend to update their norm to that of the majority group, especially when homophily is 
intermediate and the minority group not large. In contrast, we see the majority group 
leaving their norm and adopting the minority norm when the network is extremely 
heterophilic (i.e., h = 0.1) (Fig. 2). This occurs while the minority group is updating 
to the majority norm. In this situation, heterophily is so strong that the members of 
the majority group are disproportionately exposed to the norm of the minority group; 
this allows for strong influence of the minority group even when the minority is 
quite small. Thus, though the system overall produces mutual conformity, the level 
of homophily balances which group is changing their norms to accommodate to the 
other group. 

In Fig.4, homophily again balances group-level and system-level outcomes 
when considering the remaining potential for conflict within the network. When 
the network is heterophilic or neutral, few between-group conflict ties remain. In 
contrast, when the network is very homophilic, we see the potential for intergroup 
conflict almost doubled. The reverse is true for within-group conflict ties. When the 
network is heterophilic or neutral, a fair number of within-group conflict ties remain. 
When the network is very homophilic, this potential intragroup conflict is reduced 
by at least half. Thus, we see that both in terms of which group changes their norms 
and the potential conflict that remains, homophily balances between group-level and 
system-level outcomes. 


6.3 Strategies to Maintain Minority Norms 


The maintenance of a cultural identity, partially defined by normative practice, 
can be extremely important. Our simulation lends support to three methods for 
maintaining minority cultural practice visibly employed by minority groups in 
reality: isolationism, adopting positions of influence, and increasing the group size 
of one's minority. Within the model, the minority group was best able to maintain 
their own norm in extremely homophilic networks, extremely heterophilic networks, 
and when their group was large. 

Extremely homophilic networks in our simulation mimic strongly isolationist 
cultures in reality. Such isolation can be imposed upon a minority group (e.g., being 
excluded from mainstream culture), but can also be sought out as a source of cultural 
affirmation and strength (e.g., resisting assimilation into mainstream culture) (Berry, 
2005). This latter motivation has been expressed by groups as different as the Amish 
in the USA and anti-capitalist leadership in China. The recognition of community- 
level benefits of culturally affirming and relatively homogeneous environments can 
be seen in the push to maintain historically black colleges and universities, even 


Network Structure, Group Norm Distribution, and Norm Conflict 133 


as black students in America have increasing access to other institutions (Franke 
and DeAngelo, 2018). Though isolationism may draw critique as backward-looking, 
it can be a deep recognition that intergroup contact can fundamentally affect the 
culture of a minority group. 

Extremely heterophilic networks in our simulation, in contrast, are closely related 
to minority groups which attempt to have their members in positions of overall 
societal power. Rather than completely preserving group norms through isolation, 
this strategy attempts to change the larger culture by exerting influence on the 
majority. This can be observed in efforts to get members of minority groups elected 
to positions of power, with the explicit goal of increasing minority voice in the 
government. By holding positions of power within a larger society, minority group 
members can become hubs to spread their own group norms. 

The final strategy we can relate to our results is to increase one's group size. The 
logic here is fairly straightforward: the larger a group, the greater chance it has of 
influencing the whole system. We can see this strategy in the tendency of minority 
group members to define their groups expansively, stressing the similarities with 
the majority group (Wimmer, 2013), and the converse tendency of majority group 
members to define their groups strictly (Dovidio et al., 2007). 

The three strategies which emerge from our study are far from a complete set; 
there are many other strategies well outside the scope of our current work. For 
example, minority groups actively resist norm change (Xie et al., 2010), cultural 
institutions formally negotiate over cultural practices, and younger generations 
modify their inherited cultural practices. We leave model-based exploration of these 
possibilities for future work. 


6.4 Limitations and Future Directions 


In the effort to construct a parsimonious model from existing theory, we acknowl- 
edge that there are many assumptions in this model that could be productively 
expanded. First, one could incorporate more than two groups or multiple kinds 
of interpersonal ties. Second, one could make the model more realistic by having 
a series of inter-correlated norms held by each group, such that individuals have 
different thresholds to specific norms, or a different weight for norms depending 
on in-group membership of neighbors. Third, one could integrate psychological 
theories of preferential information processing to have agents differentially weight 
the norms expressed by their neighbors based on shared in-group membership. 
Such modifications would allow us to expand from descriptive to injunctive 
norms, involving higher-order cognitive processes such as persuasion (Cialdini and 
Goldstein, 2004) and contrast with personal values (Wei et al., 2016) that could 
be modeled in agents. Finally, it would be valuable to explore other distributions 
of thresholds within the network to explore more realistic and complex scenarios. 
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These further developments would also increase the options for validating this 
model against real world data (e.g., gathering experimental data or found social 
network data measuring intergroup norm spread). Thus, continuing to grow this 
work can increase its contribution to the nexus between networks, social norms, and 
conflict. 

Despite these limitations, the current study provides a novel and meaningful 
insight by providing a streamlined example of how group size and homophily can 
affect the adoption and maintenance of group-affiliated norms. We have shown 
that even in this simplified version of reality, differences in group proportions 
and homophily have different effects for majority and minority groups, and can 
affect the degree to which groups eventually adopt similar distributions of norms. 
We also contribute to the exciting interdisciplinary growth of computational social 
science by providing a novel agent-based model that includes both structure of social 
networks and social influence in one framework. 

Finally, we hope that this work contributes to existing knowledge on assimilation, 
acculturation, and between-group conflict over norms. Our simulation demonstrates 
that assimilation is most likely at low (heterophilic networks) and intermediate 
levels of homophily. At intermediate levels, the minority group largely conforms 
to the majority group. This moves the system towards collective harmony, but does 
so at the cost of the minority group giving up its own norms. At low levels of 
homophily, when minority group members have a structural advantage within the 
network (i.e., central positions with many ties), we see accommodation from both 
directions: the minority members take on the norm of the majority group, but the 
majority members also take on the norm of the minority group. Taken together, these 
suggest that collective harmony is maximized when groups are interconnected, and 
that this is accompanied by the dispersion of minority norms when there is a strong 
preference for out-group contact. 
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Appendix: Analytical Derivations for Norm Endorsement 


In this appendix, we derive the probabilities of norm endorsement in each group 
using the mean-field approach. This analysis enables us to gain insights on the 
relationship between the model parameters of homophily, group size, and group 
norm distribution. In addition, the analytical derivations help us to interpret the 
outcome of the simulations in Sect. 5. 

More specifically, we calculate the probabilities of a minority agent to update to 
the majority norm and vice versa. We use mean-field approximation (also known as 
the deterministic approximation) which means that we look at the average behavior 
of the group in an equilibrium state (Marro and Dickman, 2005). That means, 
we do not consider the changes over time and the heterogeneity of the agents. 
Nevertheless, the mean-field approach gives us a useful insight on forecasting the 
overall behavior of the system. Let us assume that the minority is denoted by a 
and the majority is denoted by b. Two norms are denoted by norm A and norm 
B. Homophily is denoted by h and group proportion is denoted by g. In order to 
calculate the probability of a minority agent to update the majority norm (B), we 
need to estimate the probability of a minority to be connected to majority agents 
(pap) and the probability of the minority agent to be connected to minority agents 
(Paa). Since our agent-based model assumes a preferential attachment mechanism 
and defines group proportion (g), the probability of two agents to be connected 
depends on their homophily (A) and the degree of the agent (k). Link formation is 
a combination of two mechanisms, namely homophily and preferential attachment, 
and thus the probability of connectivity follows a nonlinear function. To estimate 
the link probabilities, apart from homophily, we need to estimate the degree growth 
function (C) of each group of agents. The degree growth determines the attractivity 
of the agents with regard to their degree. The degree growth in this model follows a 
polynomial function of order three with one valid solution and it can be calculated 
numerically (Karimi et al., 2018): 


_ hC (1—A)C 
c- (s(1* era sa-5)** Oman): 


The probability of two agents of group a (Paa) and two agents of group b (ppp) 
to be connected is 


hC 

“ACHO W- C)’ 
hQ- C) 

h2—C)-ü0-—mC ^ 


Paa 
(3) 


Pbb = 


In addition, the degree growth function has the following relation to the 
probability of linkage (Karimi et al., 2018): 
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C = g(1 + paa) + (Y — g)ppa . (4) 


The probability of a minority agent to update to the majority norm (fag) depends 
on the probability of being connected to majority (pap) and minority (Paa). Thus, 
for a minority agent, the fraction of neighbors with norm B is 


+ 
fab _ Paa PaB T PabPbB . (5) 
Paa + Pab 


The numerator consists of two parts; the probability of connecting to another 
minority with norm B (pa pap) and the probability of connecting to majority with 
norm B (papppp). To estimate the fraction, the nominator should be divided by 
the total probability of connectivity between the majority to majority and minority. 
Inserting Eq. (3) into Eq. (5), we find 


hc h(2—C) 
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Similar relation can be found for the probability of a majority agent to update to 
the minority norm (fpa): 


_ PbbPbA + PbaPad 


foa 
Pbb + Pba 


(7) 


Figure 5 displays the analytical results derived from the above derivations. It is 
interesting to note that the update to the norm of other group follows a nonlinear 
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Fig. 5 Analytical results for the probability of minority (left) and majority (right) to update to 
the norm of the other group. Initial norm proportion is set to 20-80. We observe asymmetrical 
results as the group balance deviates from 50—50 condition. For small values of homophily 0 < 
h < 0.2 we observe similar behavior for majority and minority. However, as homophily increases, 
we observe that minority members update their norm to that of the majority with high probability, 
while majority does not update to the minority norm. The asymmetric relation is more pronounced 
as the minority group size decreases 
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and asymmetrical trend both for the minority and the majority. In the intermediate 
level of homophily (0.5 < h < 0.8), while the majority members resist to switch 
its norm to minority norm, the minority updates to the majority norm with high 
probability. That would create a higher advantage for the majority norm to persist 
and stabilize. Only when homophily is very high, the probability of the minority 
members to update to the majority norm starts decreasing. As the minority size 
shrinks, the inequality in norm adoption increases. 
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Abstract This chapter studies individual and network conditions for the emergence 
of large social protests in an agent-based model. We use two recent examples 
from Iran and Germany to inform the modeling process. In our agent-based model, 
people, who are interconnected in networks, interact and exchange their concerns on 
a finite number of topics. They may start to protest either because of their concern 
or because the fraction of protesters in their social contacts exceeds their protest 
threshold. In contrast to many other models of social protest, we also study the 
coevolution of topics of concern in the not (yet) protesting public. Given that often 
a small number of citizens starts a protest, its fate depends not only on the dynamics 
of social activation but also on the buildup of concern with respect to competing 
topics. Nowadays, this buildup happens decentralized through social media. The 
model reproduces characteristic patterns of the evolution of the two empirical cases 
of social protests in Iran and Germany. In particular, our results show that positions 
of agents with certain concern levels on certain topics within the networks are 
important for the fate of protests. 
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1 Introduction 


Street protests frequently happen all over the world. In the USA alone, from January 
20, 2017 to June 16, 2019, 13,761 protests with 10,827,646 attendees have been 
recorded (Count Love 2019). Street protests are typical examples of emergent social 
phenomena that result from the interaction of many heterogeneous and autonomous 
agents. Changes in the number of attendees and topics of protest are inherent in 
street protests as the case of Russia from 2007 to 2013 shows (Lankina 2014). By 
topic of protest, we mean the main issue that protesters address in a protest. After 
the emergence of a protest, slogans and behavior of protesters, peaceful or violent, 
influence how others judge it and may constrain more people from joining. 

New technologies, specifically social media, have changed our communication. 
They made cheap and fast interactions among a large number of actors in a 
decentralized ways possible (assuming that administrators of social media do not 
intervene in interactions). These technologies have helped to organize protests. 
A protest announcement can reach millions in no time without central control. 
Moreover, they have enabled amateur multimedia reporters to broadcast details of 
the protest in real time. 

On the other hand, social media is not only a place for announcing street protests 
or sharing information about these. Social media is also a place for genuinely digital 
protest in the form of campaigns and conflicts among users. In this chapter, we 
address the mutual relationship between street protests and their perception by the 
public. A significant part of this perception emerges in the media in general and 
in social media in particular (see Elson et al. 2012 and Anstead and O'Loughlin 
2014). Not everyone is active in social media, and users do not publish all their 
ideas. Nevertheless, social media still captures reasonably how people perceive a 
street protest, due to the large number of users and their somewhat equal share of 
power. 

The role of social media in the emergence of conflicts has been well studied 
considering street protests and social media as intercorrelated and interdependent 
(Ayres 1999; Gerbaudo 2012; Valenzuela et al. 2012; Penney and Dadas 2014; 
Qi et al. 2016). Hussain and Howard (2013) show that mobile phone usage had a 
crucial role in the success of the Arab Spring social movements, while social media 
helped to expand it (Lim 2012; Howard et al. 2011). In the Philippines, during the 
2001 protest, again, mobile messaging played a prominent role as the place and 
time of the protest were coordinated through text messages (Shirky 2011). In the 
case of Occupy Wall Street, Twitter, Facebook, and YouTube played a significant 
role (DeLuca et al. 2012). These media are successful in mobilizing people since 
their decentralized structure allows for large-scale cascades of messages (González- 
Bailón et al. 2011). They also have a leading role in the first stages of protests. When 
traditional media start covering these protests, social media effects mix with those 
of traditional media (González-Bailón et al. 2013). 

The majority of the previous studies focused on how social media has helped to 
mobilize potential attendees of protests. However, only a few have focused on the 
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interplay between street protests and their image in social media and recognized 
mobile communication as a context for the creation of counter-narratives in street 
protests. One example is Neumayer and Stald (2014), who studied cases in Denmark 
and Germany. 

We will model the internal dynamics of typical street protests and their relation- 
ships with the perception of the broader public who is active on social media. The 
simultaneous changes in a protest, in terms of the number of attendees and slogans, 
and the image of the protest in the broader public, in terms of popular topics, are 
modeled with an agent-based approach. We will use two distinct cases to inform the 
modeling process one in Iran and another in Germany. 

The emergence of social protests has been captured with threshold models of 
collective behavior by Granovetter (1978) and Kuran (1989). Both models assume 
that every person has an individual threshold defining the minimal percentage of the 
protesting population that convinces the individual to join the protest. The rational- 
choice interpretation of this threshold is that this is the value where expected benefits 
exceed the expected costs of protest. In that sense, a low threshold stands for a 
person with strong concerns who is easily motivated to protest, while a person with 
a high threshold has few concerns. A reasonably simplifying first assumption, also 
used by Granovetter, is that thresholds are normally distributed with a certain mean 
value greater than zero. The persons with thresholds of zero or below are the initial 
protesters who can trigger others also to start to protest and so on until a final number 
of protestors is reached. For a normal distribution with a standard deviation below 
a critical value, only a small fraction ends up in protest, while the protest cascades 
to almost all people when the standard deviation is slightly above a critical value. 
Granovetter' s model assumes that every person has the information about the global 
fraction of protesters. 

Granovetter's and Kuran's model can explain how it comes that suddenly large 
protest movements emerge. They however do not take into account that individuals 
might not assess the fraction of protesting people in the whole population but 
only those in their immediate social networks. Watts (2002) applied the idea of 
Granovetter's threshold model to an undirected random network and showed that 
global cascades can be triggered by one protester while other protesters have 
equal thresholds (e.g., all 0.2) when the average number of links (i.e., association 
among protesters) is relatively low (between 1 and 6). The cascades, however, 
do not happen for denser networks. Dodds and Watts (2004) developed this 
model into a general model of contagion where individuals may receive several 
"doses" of motivating messages from others that sum up and trigger protesting 
when a threshold is exceeded. This model also includes compartmental models 
from epidemiology, such as the SI (susceptible-infectious) and SIR (susceptible- 
infectious-recovered) models, which are the canonical models for the spread of 
infectious diseases. In our model, we will use the threshold concept as well as the 
dose concept. We will assume that people can be activated to protest when a fraction 
of their social contacts is protesting. We will also assume that social media messages 
function as doses of concern with respect to particular topics for people who are not 
yet protesting. 
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Lohmann (1994) modifies the Granovetter and the Kuran model to focus on 
participation in a street protest as a costly political action that reveals information on 
how likely protesters deem political change. Klein and Marx (2018) pursue a similar 
idea but focus on explicit conversational information exchange between agents and 
asymmetric learning as a driving factor for the formation of mass movements. In 
their model, agents have a certain level of grievance and develop expectations of 
how likely political change is. When two agents meet randomly, one of them can 
access the other one's attitude. If asked, agents reveal their interest in change. 
However, since asking has a cost, only those critical of the system try to elicit their 
conversational partner's attitude. Hence, agents can learn from replies to questions 
that they ask themselves but also from others asking them or refraining from doing 
so. This results in asymmetric learning because agents who want change can learn 
from any interaction while supporters of the status quo only learn from interactions 
where their conversational partner and not they themselves are given a chance to ask. 
This means that genuine advocates of change deem change more likely than their 
peers who are content with the status quo. Moreover, the model shows that agents 
tend to underestimate the chance for change, while expectations are more accurate 
for societies with higher spatial or social mobility. Our model implements the idea 
that expression of one's concern is costly by implementing the possibility of agents 
sending unpolitical messages instead of expressing their concern. Furthermore, the 
above-mentioned possibility of social activation honors the fact that people decide 
whether to join a protest also based on information about the recent number of 
protesters. 

Similarly, Epstein's (2002) model of decentralized rebellion is mainly built on 
activation but not only triggered by a high number of already activated agents in the 
neighborhood but also inhibited by repression through the presence of cops. Like the 
aforementioned models, Epstein's focus is also on fundamental rebellion and hence 
only captures concern with a single topic (1.e., the current general situation), thereby 
ignoring the possibility of multiple topics within a protest and agents influencing 
each other with regard to these topics. 

In her investigation of the development of news cycles, Waldherr (2014) touches 
upon different topics gaining or losing attention within a group of interacting agents. 
In her model, however, agents are journalists and they are interested in topics rather 
than concerned about them; thus, attention for a topic has no external consequences 
like the formation of a protest. 

Although our model builds on existing works on protest development as a 
question of costly choice, social exchange, and information retrieval, it goes beyond 
previous research by taking into account agents' genuine concern about multiple 
topics as well as their communicative exchanges on social media. In the following, 
we analyze two recent cases where protest dynamics and topical changes coevolve 
and use them to develop our agent-based model. 
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2 Data 


To capture significant similarities and differences in the street protest phenomenon, 
we deliberately take data from two recent cases of protests from different cultures 
and political systems: The Iran protests in December 2017 and January 2018 
exemplifies a short-term protest that heated up quickly and ended abruptly. In 
contrast, PEGIDA in Germany since 2014 is a case of long-term motivation of 
protesters and slow topic shift that can be considered as a social movement. A social 
movement is a sort of organization with specific goals (Della Porta and Diani 2009, 
p. 145-150) while a single protest, like the Iran case may not have leaders or definite 
goals. Protests, in the long run, may turn into a social movement. 


2.1 Iran Protest in 2017/2018 


Our observations on this case stem from primary data analyses of videos and 
photographs taken at the street protests depicting protest slogans and Iranian online 
activity, mainly on Telegram, the most popular social media platform in Iran 
(Techrasa 2016), with regard to the protest. 

On December 28, 2017, in Mashhad, capital of Khorasan Razavi province and 
the second largest city in the country after the capital city Tehran, a demonstration 
took place in the main square of the city by the invitation of some hardline 
fundamentalist/conservative political groups opposing the government of president 
Rohani. The organizers aimed to put pressure on the president by focusing 
on economic problems and showing how people are supporting the opposition. 
However, within hours after the protest started, organizers could not control the 
crowd and slogans radicalized to the critique of the whole political system. Videos 
from the demonstrations were shared online to millions of people, and consequently, 
unlike previous Iranian street movements, protests started in more than 100 cities 
all over Iran (Rahmani Fazli 2018), with the most intense protests occurring in 
smaller cities. However, only minor physical conflicts with authorities arose, and 
there was less systematic oppression by the police in comparison with previous 
protests, because the government tolerated the protests as the president recognized 
the protesters’ right to express their concerns (Euronews 2018). Demonstrations and 
their repercussion on social media faded out a week after the first protest. 

Albeit this short time frame, the protest experienced various shifts of focus 
visible in daily changes of protest slogans. Overall, we could identify nine different 
topics shown in Fig. 1. We identified these nine topics by watching all protest videos 
posted in influencing Telegram channels (73 videos) during the seven days of the 
protests in more than 36 cities. In total, 78 different slogans were extracted from 
these videos. Then, we categorized them into nine topics. Figure 1 also shows in 
how many cities we found slogans from a given topic. Moreover, topics had different 
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Fig. 1 The number of cities with a topic of protest for the seven days of the protest in Iran 


fates along the seven days of protest, as some had uniform popularity and others had 
more fluctuations in popularity. 

Figure 2 shows that different cities had a different number of topics on different 
days. This underlines the fact that there was not one dominant topic in the protest. 
This underpins that the movement was decentralized and spontaneous and hence 
subject to internal dynamics instead of being led by political players pursuing a 
specific agenda. For example, initial supporters condemned slogans against the 
whole political regime as some of them are strong defenders of it (Zand 2017). Most 
political camps in Iran were surprised by the daily developments and hence confined 
themselves to interpreting events in light of their own goals. This struggle for 
interpreting the protest was clearly reflected in the first page of political newspapers 
(Khedmati 2018). 

Ordinary non-protesting citizens also interpreted and discussed the developments 
within families, with colleagues or friends in person and on social media (BBC 
2018). Hence, albeit the fact that only about 0.196 of the population joined street 
protests directly (Rahmani Fazli 2018), a broad debate in society mirrored the 
protest topics and tensions between them not only during the protest but still a month 
after it. 

We initially aimed to find the popularity of each protest topic on several social 
media channels, namely Instagram, Telegram, and Twitter, day by day. This could 
have shown how the street protest and online trends are interrelated and evolve 
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Fig. 2 The number of popular topics in the whole country and four cities during the seven days of 
protest in Iran 


mutually. But tools for analyzing social media turned out to be too expensive and 
not Persian-friendly enough. 

Google Trends, as it is deployed by many scholars (e.g., Choi and Varian 2012; 
Mellon 2013; Mellon 2014; Minkus et al. 2019) was an alternative for our goal. We 
searched for the popularity of street slogans on the web, which includes any kind 
of a text published on the internet (if Google indexes it). This can be a fine proxy 
of what was important for Iranians during the lifespan of the 2017/2018 protest. 
Each topic consists of some slogans in the reviewed videos that were gathered from 
influential Telegram channels. As Google Trends gives indices only for single words 
or terms, it was impossible to search trends with multiple slogans that constitute one 
topic. So we searched for single slogans in Iran in Google Trends. Out of 73 slogans, 
there were data for 10 in Google Trends. The existence of common trends between 
the number of protesting cities and the Google Trends index shows how hot debates 
on the internet and in the streets were associated. This is illustrated in Fig. 3. 

Figure 3 shows the different patterns of the positive relationship between 10 
slogans on the internet and in the streets (in cross-correlation analyses, eight of 
them had positive relationships). In most cases, the climaxes of online and street 
topics are the same day or a little retarded or anticipated. It is thus plausible that 
online and street protests respond to each other. This means that street protests and 
debates on social media react to each other and can have mutual influence. 

The Iran case is an example of protests that start fast and gather people with 
different topics of interest. In this case, the protest is well-connected to interactions 
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Fig. 3 Examples for the number of protesting cities and Google Trends index during the lifespan 
of the 2017/2018 Iran protest 


among different actors on social media. Change in the popularity of topics in cities 
and the lateral change in Iranians’ online concerns is considerable. 


2.2 PEGIDA, Germany Since 2014 and Ongoing 


The far right-wing populist movement “Patriotische Europäer Gegen die 
Islamisierung des Abendlandes" (Patriotic Europeans against the Islamisation 
of the Occident) or short PEGIDA was founded in closed social media groups 
without party affiliation (Vorlünder et al. 2018, p. 2). Soon, a public Facebook page 
was launched for communication with protesters and general political statements. 
The page was banned for violation of community standards but immediately re- 
established (Vorlünder et al. 2018, p. 23-27). Weekly street protests started in the 
city of Dresden in Saxony, Germany, and sparked protests in other German cities. 
On October 20, 2014, about 350 protesters joined the first PEGIDA street protest 
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Fig. 4 Number of PEGIDA protesters between October 20, 2014, and November 26, 2015 
(Source: Durchgezáhlt.org) and tendencies in topics lobbied in the protest as found on the PEGIDA 
Facebook page and elaborated by Rucht et al. (2015) and Vorlander et al. (2016) based on social 
media communication, slogans at the protest and interviews conducted with protesters. Darker 
shades mean higher tentative prevalence of the respective topic: 1 — Fear of radical Islam; 
2 — critique of Dresden local refugee policy; 3 — anti-feminist sentiment; 4 — fundamental 
criticism of political establishment and traditional media; 5 — general xenophobia/anti-refugee 
sentiment 


and expressed their worries about public demonstrations in support of different 
parties in the Syrian civil war and Islamic extremism. However, this developed 
into a rejection of Islam and Muslims in general over the next weeks. Moreover, 
PEGIDA criticized first the local Dresden refugee policy and later the German 
national one (Vorlünder et al. 2016, p. 5-7). Figure 4 shows that the number of 
supporters constantly rose and peaked at 25,000 on January 12, 2015 (https:// 
durchgezaehlt.org). After that, the number of protesters decreased to 2000-3000 
and PEGIDA organizers amended its topics by a general critique of the political 
establishment and also the traditional media. When the so-called refugee crisis in the 
EU arose in summer and autumn 2015, several hundred thousand refugees arrived 
in Germany and were sent to (often improvised) shelters across the country (Glorius 
et al. 2018, p. 113). As a consequence, PEGIDA could then mobilize more people 
again. This resulted in a second peak of 15,000—20,000 protesters on October 19, 
2015. After spring 2016 the figures declined again and remain at the level of about 
2000 protesters until now (2019). These protesters still oppose migrants, traditional 
media, and the German political establishment. 

During the protest peak time in December 2014 and January 2015, only few 
protesters interviewed or surveyed by different research teams (Vorlander et al. 
2015; Rucht et al. 2015; Geiges et al. 2015) expressed particular issues with Islam, 
which at the time was the main message conveyed by PEGIDA organizers on 
banners at the protest or on social media. Instead, protest attendees were more 
concerned about refugee policy in general and felt alienated from the political 
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establishment (which partially led to distrust in the political system altogether) and 
were unhappy with traditional media news coverage of political events. 

Overall, PEGIDA in Germany exemplifies a type of protest that starts without 
connection to existing political actors and with no revolutionary ideas but focuses 
on a narrow topic; however, people who are generally dissatisfied join the protest 
and cause a shift of topics towards a more generalized critique of the political 
establishment. Furthermore, PEGIDA shows that protests can prevail with a small 
number of supporters and despite not reaching any of its goals directly impact the 
general political debate (Vorlander et al. 2018, p. 26). Finally, PEGIDA highlights 
the importance of understanding the interplay of street protests and their social 
media image as well as interactions of protest leaders and other political actors. 


2.3 Stylized Data Facts 


Although PEGIDA in Germany and the 2017/2018 protests in Iran seem very 
different at first glance, our analysis revealed common structural features that a 
model should capture. 

In both cases, topics lobbied in the street shifted away from what organizers of 
the initial protests intended. This was also noted for the Arab spring by Hussain and 
Howard (2013) and for Occupy Wall Street in the USA and anti-austerity protests 
in Greece or Spain by Theocharis et al. (2015). One explanation for these shifts is 
that protesters have diverse ideas. A few of these ideas can grow more popular and 
dominate others over time. However, radical shifts do not only occur with regard to 
topics, but also the number of protesters can drastically increase or decrease either 
gradually or from one protest day to the next one. 

While activity on the street is an important factor for publicity and acknowl- 
edgment of a protest, nowadays online space largely contributes to the success of 
a protest in two ways: Firstly, as spaces for debate, online social networks allow 
protesters to spread their ideas and develop them further (cf. Lim 2012), as it can, 
for example, be observed on the PEGIDA Facebook page or in Telegram groups 
of Iranian protest supporters. Secondly, people can also practically coordinate 
upcoming street protests by, e.g., sharing information about the time and location, 
which PEGIDA frequently does prominently by utilizing its page profile and cover 
pictures for that purpose. 

Overall, differences between street protests seem not to be of structural nature, 
but one should investigate reasons for these differences in specific circumstances of 
the protest and protesters. However, these specific circumstances can often not easily 
be empirically accessed, or one may want to predict the fate of future protests based 
on assumptions about the shape of the specific factors. Hence, our model helps to 
assess how different circumstances relate to different protest fates. 
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3 Agent-Based Model 


We model the evolution of street protests in terms of individual concerns, social 
activation, topic selection, and increasing concerns through topic propagation in 
social media. We only model the activation of protesters and the buildup of concerns 
and not the dampening and die-out of protests. Before specifying the details, we 
describe the basic idea. 

In our model, several individuals are interconnected in a social network. Each 
individual holds political concerns of different magnitudes on a finite set of topics. 
Individuals have different protest thresholds analog to Granovetter (1978). They 
begin to protest when at least one of their concerns is above the threshold. If this is 
not the case, they may also join the protest when the fraction of protesting others 
in their social contacts exceeds the threshold. Thus, individuals can protest because 
of concerns or because of social activation. The decision to do either is based on 
the same individual threshold. So, we assume that individuals with a low threshold 
are more susceptible to both types of activation. They will start to protest already 
with low concerns or with a low fraction of protesting others. An individual joining 
a protest decides on one of the topics to protest. A concerned protester selects one 
of the topics where the concern is above the threshold. Selection is probabilistic 
with probabilities proportional to concerns. A protester protesting because of social 
activation has no concern above the threshold and therefore selects a topic from 
all topics based on probabilities proportional to their concerns. Further on, all 
individuals, protesting or not, post a message in their social network. Individuals 
post if they joined a protest and with which topic. Individuals who did not protest 
post a message unrelated to the protest. The next day people with no concern above 
threshold read the messages in their social media news feed. Individuals with no 
concern above their threshold randomly pick one of the messages from their news 
feed. When this message is protest-related, they increase their concern on the topic 
of the message. This can be considered a dose of concern analog to Dodds and 
Watts (2004). People who follow mostly non-protesting others will receive mostly 
messages that are not protest-related and will thus likely not increase their concerns. 
This whole process repeats daily and can trigger different fates of protests with 
respect to the number of protesters and how prevalent the different topics are in the 
protest. We implemented this model in NetLogo (Wilensky 1999). The model is 
also provided for reference (Lorenz et al. 2019). It can be downloaded and run in 
NetLogo, which is free to use. 


3.1 Agents, Follower Network, Thresholds, and Concerns 


The agents in the model are individuals who are connected in a static directed 
social network built upon initialization. The network can be interpreted as a follower 
network where an agent can read the social media posts of the other agents following 
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but not necessarily vice versa. The follower network is created in two parts. First, 
a directed preferential attachment network is built, representing links to potential 
"celebrities" at various magnitudes of fame. This is built by successively creating 
agents whereupon creation each follows a fixed number of other already existing 
agents (or all other agents for the very first agents), where agents are selected with 
probabilities proportional to the number of current followers (plus one to give the 
newest nodes a non-zero probability to become selected). Second, a friends' network 
is built. A friend-link is represented by reciprocal follower relations. The friends' 
network is a random network, where every possible link is created with a probability 
selected such that the expected number of friends is a certain integer.! That way, we 
mimic a typical network with mixed properties of social and information networks 
showing, for example, an in-degree (followers) distribution more skewed and fat- 
tailed than the out-degree (following) distribution (Myers et al. 2014). 

As variables, each agent has a protest threshold which stays constant after 
initialization and a vector of concern values on a certain number of topics. The 
topics are the same for all agents; the concern values can change over time. Protest 
thresholds are real numbers. Any number larger than one represents an agent who 
will never join a protest for personal reasons but can impact others' decision to join. 
Any value of zero or less represents an agent who will always protest regardless of 
concern or protesting others. Concern values are integers, including zero but less or 
equal to a certain maximal concern. In a world with five topics, the concern vector 
[0 2 3 0 7] represents that the agent is most concerned about topic 5 and not at 
all concerned about topics 1 and 4. Two other dynamic variables of agents are the 
protest status, which is either “concern,” “social,” or “no,” and their topic of protest 
on which they post a message in social media. The protest topic is zero if an agent 
is not protesting, or the topic (represented by the numerical label of the topic) they 
chose for the protest. As this is chosen randomly with probabilities proportional 
to concerns, this is usually a topic on which they have a high concern value. The 
topic of protest represents a protest-related social media post which can be read by 
followers. A topic of concern with the value zero represents a posting not related to 
protest. 


3.2 Agents" Activities 


In the full model, there are three protest mechanisms: the decision to protest 
because of concerns above threshold (concern protest), the increase of concern 
through information from social media (social media concern), and the decision 
to protest because of many others in the social network protest (social activation). 


! Besides a random graph the NetLogo model also includes options to build the friends’ network 
based on a ring or on several cliques for robustness tests. These are not used in this work. 
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These mechanisms can be independently switched on and off. In the following, we 
consider that all are switched on. 

On each tick (we can think of a day) agents do the following activities. 

Step 1. All agents which are not already concerned enough to protest read their 
news feed and compute the fraction of the people they follow who protested. The 
news feed represents the list of social media messages from the last day an agent 
reads. In the model, it is a list of the topics of protest and non-protest-related 
messages of all agents the agent follows. From the news feed, the agent also extracts 
the fraction of protesting people. 

Step 2. All agents decide if they join the protest. 

Step 2.1. (Concern protest) An agent checks if a concern on at least one topic is 
greater than the individual protest threshold times the maximally possible concern 
value (a global parameter set to ten in the following). If this is the case, the agent 
sets the protest status to “concern.”* We refer to this condition as "concern above 
the threshold." An agent with protest status "concern" then selects the topic of 
concern from all the topics that are above the threshold randomly with probabilities 
proportional to concern values. 

Step 2.2. (Social media concern) Agents without concerns above the threshold 
will read their news feeds and select one topic of concern from this list at random. 
This can well be zero, representing a message which is not protest-related. In this 
case, nothing more happens. If it is a protest topic, the agent will increase the 
concern value on that topic by one. Note that, agents with a concern above the 
threshold will not increase any concerns anymore, because we assume that an agent 
only needs one concern above the threshold to join, and we are only modeling the 
buildup of one particular protest. 

Step 2.3. (Social activation) Agents without concerns above the threshold check 
if the fraction of people followed who protest is greater than the threshold. If this 
is the case, the agent is socially activated and sets the protest status to "social" 
otherwise to “no.” If the agent protests, one of the topics is selected for the protest at 
random with probabilities proportional to the concern values. This selection happens 
even though the concerns themselves are all not above the threshold. The rationale 
is that for joining a protest, even when only socially activated, one needs a topic of 
concern. 


3.3 Initial Conditions and Stopping Rules 


The initialization of a simulation run (setup procedure) is done as follows. First, 
a fixed number of agents is created. Then, directed links are created using the 
preferential attachment generator and, as described above, further reciprocal friends' 


?The protest threshold is multiplied with the maximal concern to match the concern values (from, 
e.g., (0, 1, ..., 10}) with protest thresholds (from [0,1 ]). 
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links are created in a random network. Network generation is steered by the 
parameters following and friends, which describe the desired average number of 
directed and reciprocal links an agent should have. Agents' static protest thresholds 
are random numbers from a normal distribution with mean threshold level and 
standard deviation threshold-dispersion. Agents’ dynamic concern vectors are topic- 
num integers between zero and max-concern. The concern value on each topic 
is initialized as a binomial random number with probability initial-concern-level. 
This implies that the expected concern on each topic for each agent is initial- 
concern-level times max-concern. The three mechanisms of concern protest (CP), 
social-media-concern (SMC), and social activation (SA) described above can all be 
independently switched “on” and “off.” All “on” specifies the full model. 

It turns out that only the five combinations of CP, SA, CP-SA, CP-SMC, 
and CP-SMC-SA are sensible configurations to distinguish. Just SMC will never 
spark anyone to protest, and SMC-SA would fully coincide with SA with respect 
to the protest status of agents. A logical analysis of all model variants shows 
that an agent can only experience three types of transitions of the protest status: 
“no” — “concern”, “no” — “social”, and “social” — “concern”. Therefore, the total 
number of protesters (genuinely concerned or socially activated) can only increase 
or stay constant. As already mentioned, the decline of protests was deliberately not 
the aim of the modeling. 

Only when social media concern is switched on, the concerns of agents can 
increase. Thus, it is easy to see that in the CP regime the total number of protesters 
is reached after one time-step, and in the SA and CP-SA regime, the total number 
of protesters is reached when the number of protesters stays constant for one time- 
step. The chosen protest topics may change but the distribution for the probabilistic 
selection stays constant. 

When social media concern is switched on, every agent who is at some point 
following a protesting agent will successively increase the concern on at least one 
topic in the long run. Consequently, the agent will turn into a concerned protester 
unless the agent has a protest threshold above one, which rules out any protest. 
Thus, in most configurations with social media concern, all agents with thresholds 
below one end up with protest status “concern.” Furthermore, the concerns of agents 
remain stable once they are concerned protesters. Therefore, also the distribution 
for the random selection of topics of concern in the whole society stabilizes once 
all protesters turn to be concerned protesters. We use these insights to define the 
stopping rules for simulation runs. 


4 Simulation Experiment 


In our model exploration, we used always the same network generation parameters. 
We assume that each agent follows five others (directed links in a preferential 
attachment network) and has five friends (undirected links in a random network). 
Furthermore, the maximum possible concern is fixed at ten. First, we work with 
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distinct values of the threshold level, initial concern, and threshold-dispersion to 
represent the cases of Iran and Germany. Afterward, we explore the whole space 
of the three parameters to gain general insights on the model mechanics that 
help understand other protests as well as possible future developments in our two 
example cases. When interpreting the simulation outcome, one should bear in mind 
that the total population in the model does not necessarily represent everyone in 
society but is conceptually limited to those active people linked to the protest on 
social media as well as generally sympathetic about the protest and any of its topics. 
Moreover, the time-frame is different for different protests based on how dynamical 
they are. 


4.1 The Iran Case in the Model 


The empirical observation of the Iran case (see Sect. 2.1) can be translated to nine 
topics and a low initial-concern-level (0.1), which should mirror the fact that protests 
broke out suddenly and was not connected to one rising specific concern. The 
medium threshold level of 0.5 represents the fact that people generally had a sense of 
urgency and willingness to express their views but at the same time, they would not 
campaign on the streets lightheartedly. Since the Iran protest evolved dynamically, 
one should understand a time-step in the model as a few hours in reality, while the 
population consists of all citizens who consider street protests an appropriate and 
efficient way of expressing their political opinion, regardless of their political stripe. 

Figure 5 depicts a simulation run, exploiting such parameter values. It compares 
snapshots of the model world to the overall development of the number of protesters 
and prevalence of topics in the protest. In the model world, non-protesting agents 
are black, while protesting agents have the color of their protest topic shaded by 
their concern value on that topic. Agents with protest status "concern" are filled 
dots, while the protest status "social" is indicated as white-filled circles. Agents 
are arranged according to the structure in their preference attachment (follower) 
network, and an agent's size indicates its centrality in that network. 

Figure 5 shows that initially, only a few people protest, but numbers steadily 
increase quickly. The social dimension of the protest is most important after it has 
gained considerable concern-driven support (starting at step 27); at this point, some 
people would not yet have joined the protest out of concern but they do join because 
of social activation. This is crucial here because running the setting without social 
activation witnesses the protest dying out. However, social activation does not mean 
that people are not concerned at all and only join because they want to meet their 
friends on the street. Instead, the presence of people they know may raise one's 
faith in the protest's success or even a sense of safety when protesting, as Klein and 
Marx (2018) suggest. Once the protest has consolidated in numbers of protesters, the 
socially motivated ones also become genuinely concerned (starting from step 36). 
There are constant minor fluctuations between topics showing that many people are 
concerned about more than one topic and that slogans in the protest often depend on 
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Fig. 5 Simulation run of the Iran case (parameter setting cf. Table 1) 


up-to-the-minute news. This amounts to a different focus of protesters in different 
clusters within the network (comparable to different cities in Iran—not visible in the 
figure) and at different times. 
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4.2 The Germany Case in the Model 


Representing PEGIDA in the model works with five topics and the same initial- 
concern-level (0.1) as in the Iran case above but with a higher threshold level of 
0.7. This higher threshold represents the fact that issues of the PEGIDA movement 
are more confined to a single topic area and also less severe for people's everyday 
lives: In order to lobby a topic in the street protest, an actor has to perceive issues 
with respect to that topic as particularly severe. One time-step in the model can be 
identified with one day of the PEGIDA protest, which evolved more slowly than 
the Iran case. The population also does not represent the whole political landscape 
in Germany but includes only those possible protesters who sympathize with right- 
wing ideas or are deeply disappointed with the political and social establishment. 

In Fig. 6, one can see one possible situation of how a protest given these input 
parameters can unfold. The movement takes about 50 steps of only a few people 
protesting, which corresponds to smaller actions taken by PEGIDA organizers and 
core supporters prior to focusing on PEGIDA itself. However, after that initial phase, 
people join quickly and they mostly do that because of genuine concern, which they 
build up in the meantime, not because of social activation. That again is in line 
with the empirical findings showing that despite organizers’ appeal to moderate 
views, protesters expressed genuine right-wing sentiment (Vorlander et al. 2018, 
p. 64—66). Initially, a variety of topics is present within the protest and one of 
them temporarily becomes the most important one (in reality it was "anti-refugee"). 
However, later (at step 70) a second topic (in reality it was "critique of the political 
establishment") emerges and most protesters now (at step 99) regard this as their 
primary concern. That protesters become part of the same filter bubble (online and 
offline) also contributes to the focus on a single protest topic. Nevertheless, the 
first topic still maintains considerable support and is simply outnumbered by new 
protesters supporting the new one. While social activation has a smaller influence 
than in the Iran case simulation on the number of protesters, it is the main facilitator 
for the second topic overtaking the first topic. Additional simulations with the same 
realization of random events but without social activation show that the overtaking 
phenomenon does not occur in the case when social activation is turned off. In 
reality, PEGIDA organizers frequently encouraged protesters to actively reach out 
to their friends (Rucht et al. 2015, p. 17-18) and while these friends may not have 
shared a strong anti-refugee sentiment, they held a generally critical opinion of the 
government and joined for this reason (Rucht et al. 2015, p. 48-51). 


4.3 Comparison Between the Iran and Germany Model 
Simulations 


The parameter values for the simulations for the two cases are summarized in 
Table 1. The similarity of parameter values for the Iran and Germany cases 
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Fig. 6 Simulation run of the Germany case (parameter setting cf. Table 1) 


indicates that they are structurally related. The cases differ in the perceived overall 
importance of the protest cause which translates to a lower threshold level in the 
model of the Iran case. They also differ in the number of potential protest topics 
(fewer in Germany). Furthermore, while the specific features of the empirical cases 
(multiplicity of topics for Iran; single important topic and the crucial role of social 
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activation for Germany) are common for the corresponding parameter values, the 
effects are not guaranteed to occur under a given setting. That is the case because 
case-specific features like the layout in the random friendship network or the 
individual properties of central hubs in the preferential attachment network play 
an important role in the simulation outcome, too. Such aspects are specific features 
of a society (Vaisey and Lizardo 2010); the present model suggests that these case- 
dependent societal aspects can explain why seemingly similar protests experience 
different fates in empirical cases. 


4.4 Parameter Study 


In the following, we analyze how the final number of protesters and the distribution 
of protest topics depend on the three protest mechanisms as well as the threshold 
level and the initial-concern-level in the society. To that end, we made several 
simulation runs using NetLogo's BehaviorSpace (cf. Lorenz et al. 2019), setting 
all other parameters to the Iran case from Table 1. For an initial-concern-level of 
0.1, we vary the threshold level from 0 to 0.8 in steps of 0.05. Furthermore, we also 
vary the initial-concern-level in steps of 0.05 from 0 to 0.5, for a threshold level 
of 0.5. We either run the simulation until a natural stopping criterion is reached 
(cf. Sect. 3.3) or stop it after 1000 time-steps. This ensured that we have reached a 
final configuration where the number of protesters and the concerns do not change 
anymore. What remains are stochastic changes in the protest topics selected, because 
agents may have several topics above the threshold. We computed fifty simulation 
runs for each configuration using no prespecified random seeds. Based on how often 
each topic is selected for protest, one can compute an effective number of topics 
1/55 pi? where the summation is over all topics and p; is the number of agents 
protesting for topic i divided by the total number of protesting agents. This number 
is analog to the effective number of parties of Laakso and Taagepera (1979). 


Table 1 Parameter values 


] : ] Parameter Iran case | Germany case 

for the simulation runs with = 

respect to the two cases Population 1000 1000 
Following 5 5 
Friends 5 5 
Num-topics 9 3 
Max-concern 10 10 
Initial-concern-level 0.1 0.1 
Threshold level 0.5 0.7 
Threshold-dispersion 0.2 0.2 
RandomSeed 16 17 


The randomSeed specifies the sequence of random 
events used in the NetLogo model (Lorenz et al. 
2019) 
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Fig. 7 Results for the parameter study. The threshold level is the mean of the individual thresholds. 
Note, that the gray dots for “CP and social media concern (SMC)” configuration in Panel A are 
mostly covered by black dots 


Figure 7 shows the results. Dots show data points for individual simulation runs. 
Lines show the mean value over all simulation runs for this parameter setting. 

Panel A shows the number of final protesters with respect to the threshold level. 
With only concern protest, the number of protesters smoothly declines with a rising 
threshold level. This is simply explained by initial conditions. With social activation, 
we have a threshold regime: At a critical threshold level of about 0.35, regimes 
change from a full protest to almost no protesters. This is similar to Granovetter 
(1978) and Watts (2002). We have a mixed model with a network as Watts and 
heterogeneous thresholds as Granovetter. This critical threshold is shifted slightly 
upwards when concern protest and social activation are combined. When social 
media concern is added, protest builds up, sometimes quite slowly, until usually all 
individuals with thresholds below one protest. This is independent of the mechanism 
of social activation being on or off. For relatively high threshold levels, there is a 
sizable amount of such individuals who never protest. 

Panel B shows the effective number of topics for the same setup. This is only of 
interest when social media concern is involved. In other cases, without social media, 
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no particular structure of topics evolves. This implies an effective number of topics 
around eight. An effective number of nine would only occur with perfect equality of 
all topics. Random fluctuation brings the number to about eight. The same happens 
with social media when the threshold level is relatively low but changes with higher 
threshold levels. With threshold levels around 0.5, a hierarchy of topics evolves 
with an effective number of topics around 5. With even higher threshold levels, 
this reduces too much lower numbers of effective topics around 2. This essentially 
means that one topic dominates while others only play a minor or no role. 

Panel C finally shows how the effective number of topics changes with increasing 
initial concerns for a fixed threshold level of 0.5. Panels B and C together show that 
a larger distance between threshold level and initial concern implies a lower number 
of effective topics. Only in these cases, social media has time to build a hierarchy 
of topics before an overall protest emerges. The combination of concern protest and 
social media with social activation implies a slightly higher number of effective 
topics. This happens due to the fact that socially activated actors bring new topics to 
the protest. 


5 Discussion 


The basic model version introduced in this paper can reproduce different empirically 
Observed fates of protests and describe mechanisms that possibly cause these fates. 
We were able to show which individual protesters’ properties were necessary to get 
the patterns empirically observed in the Iran and Germany case studies. Combining 
empirical studies of processes and understanding their possible causes in the model 
thereby is a key to understand how the relation of individual decisions and exchange 
online leads to street protests. The die-out of protest, however, was not part of this 
study. 

The parameters explicitly modified in our model only provide necessary condi- 
tions for a certain protest fate. Specific features of the society in which the protest 
takes place are important for its fate too. In the model, these features are the 
network layouts and especially the positions of agents with certain concern levels 
on certain topics within the networks. Moreover, this model version is deliberately 
kept simple and hence cannot capture other aspects of social media interactions and 
street protests in reality. In an extended model, the protest could be given a more 
active role in the sense of introducing interaction among protesters in the streets 
leading to additional concern change or additional messages sent from the protest 
or individual protesters to the social network. The decision of whether or not to join 
a protest may need revision to capture real actors' decisions more closely. Finally, 
one may consider including external influences that, at a pre-defined simulation step 
or when certain conditions are met, lead to an exceptional emotional dampening 
for some or all agents regarding a specific topic or across all topics. For example, 
government reactions, such as suppression or policy change, are not included yet. 
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Abstract Non-State Armed Groups (NSAGs) operate in complex environments, 
commonly existing as one of the many organizations engaged in one-sided violent 
attacks against the state and/or the civilian population. When trying to explain 
the execution and timing of these attacks, most theories look at NSAGs' internal 
organizational features or how these groups interact with the state or civilian 
population. In this study, we take a different approach: we use a self-exciting 
temporal model to ask if the behavior of one NSAG affects the behavior of other 
groups operating in the same country and if the actions of groups with actual ties 
(i.e., groups with some recognized relationship) have a larger effect than those with 
environmental ties (1.e., groups simply operating in the same country). We focus on 
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three cases where multiple NSAGs operated at the same time: Afghanistan, Iraq, 
and Colombia, from 2001 to 2005. We find mixed results for the notion that the 
actions of one NSAG influence the actions of others operating in the same conflict. 
In Iraq and Afghanistan, we find evidence that NSAG actions do influence the timing 
of attacks by other NSAGs; however, there is no discernible link between NSAG 
actions and the timing of attacks in Colombia. Nevertheless, we do consistently find 
that there is no significant difference between the effect that actual or environmental 
ties could have in these three cases. 


Keywords Armed groups - Multi-party conflict - Attack timing - Hawkes 
process - Agent-based model (ABM) 


1 Introduction 


The escalation of violent attacks from NSAGs, in frequency and number of 
casualties, is a central threat to international security today. In 2017, while we saw 
a decrease in the overall number of inter-state armed conflicts, the number of armed 
actors engaged in one-sided violence increased noticeably (Pettersson and Eck, 
2018). Understanding the dynamics of violent attack execution better is essential 
for both scholarly research and policy-making. Efforts to effectively counter these 
threats would benefit from a more detailed understanding of the dynamics of their 
execution. 

NSAGs do not operate in isolation. Despite widespread assumptions made by 
theories of violence (e.g., in civil war research), armed conflict is rarely dyadic 
(Jentzsch, 2014). The portrayal of conflict both as combat between an incumbent 
state and a rebel organization, and of armed groups operating and making decisions 
in isolation, obscures the fact that multiple armed organizations commonly operate 
in the same conflict settings. To cite just one dramatic example, during the peak of 
the Syrian civil war, it was believed that as many as 1000 non-state armed groups 
were commanding about 100,000 fighters (BBC, 2013). 

Given the prevalence of conflict settings where multiple NSAGs operate, coop- 
erating and creating alliances (Christia, 2012; Horowitz and Potter, 2014; Gade 
et al., 2019) and/or competing against each other (Phillips, 2015; Gade et al., 2019), 
there are good reasons to believe that the behavior of a given group is not (or 
at least not entirely) independent from the behavior of other groups operating in 
the same environment. This interdependence might involve, for example, decisions 
of whether and/or when to commit a violent attack (either as a first mover or 
in response to others' attacks), as well as decisions to declare a ceasefire and/or 
sit down in the negotiation table with the government. Nevertheless, as Phillips 
(2014, p. 336) notes, research often ignores the possibility that armed groups can 
affect each other in nontrivial ways. We contend that exploring these potential 
interdependencies in the behavior of Non-State Armed Group (NSAG) in multi- 
party conflicts is a proper avenue of research to understand various conflict 
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dynamics. Moreover, recognizing and identifying this interdependence might have 
important implications for both security and peace policy. This is exemplified by 
the conflict trajectories in Eastern Congo where the enthusiasm that followed the 
military defeat of the M23 rebel group in 2015 was doomed by the fact that 69 
other NSAGs were also operating in the region (Stearns and Vogel, 2015). Similarly, 
violence has persisted in Colombia after sealing a historic peace deal with the 
FARC—the most extensive and powerful rebel group in the country's long-standing 
civil war—as other NSAGs have been fighting to fill in the power voids left by the 
rebels (Idler and Masullo, 2019). 

Does the behavior of one NSAG affect the behavior of other armed groups 
operating in the same environment? Furthermore, is behavior only affected by the 
more formal relationships that have captured the attention of the literature, or can it 
also be affected by the mere fact of operating in the same environment or fighting 
in the same conflict? In this paper we provide a preliminary exploration of these 
questions, focusing exclusively on the execution of one-sided violent attacks. To 
understand violent dynamics, instead of looking within armed organizations (see, 
e.g. Weinstein, 2007) or at the interactions of armed groups with civilian populations 
(see, e.g. Kalyvas, 2006) or with the government (see, e.g. Hultman, 2007), we focus 
on the relational environment in which NSAGs operate and examine how they might 
influence each other's actions—something that has received relatively less attention 
in the conflict literature. 

Our core contention is that the behavior of a given group is not independent 
of the behavior of other groups operating in the same environment. Specifically, 
we explore whether there is evidence suggesting that the timing of an attack by a 
given NSAG is affected by the attacks executed by others. If there is some sort of 
interdependence between attacks, in this preliminary exploration we should at least 
observe that once an armed organization commits an attack, the probability that 
another attack will take place increases. Moreover, if what is taking place is in some 
way related to this form of interdependence, we expect the impact of this additive 
effect to decay over time. 

We are not the first to explore how interorganizational relationships are related 
to armed group behavior. For example, Asal and Rethemeyer (2008) and Phillips 
(2014) have looked at how the number of alliances affect group behavior, while 
others have emphasized on whom the group is connected to, noting that what matters 
is the quality of the partners and the location of the tie within the overall network 
(Horowitz and Potter, 2014). In this paper, we consider all the possible connections 
between all the groups that are present in a given environment (what in social 
network language would be a “fully connected network" or a “complete graph") and 
differentiate between actual ties (groups with a relationship as defined by the Big, 
Allied, and Dangerous dataset!) and environmental ties (groups simply operating 
in the same country). This allows us to explore not only whether there are some 


1 See Sect. 2 for details on the dataset. 
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interdependent dynamics in the timing of violent attacks, but also whether different 
types of ties have different effects. 

Despite undeniable advances in our knowledge of how NSAGs operate, the study 
of interdependence between multiple groups operating in the same environment 
and its consequences on an armed group's violent behavior still requires more 
sophisticated empirical analysis and better theorization. By combining different 
methods to generate, analyze, and compare simulation and empirical data, this 
paper aims to make an empirical contribution to this task. Supportive evidence 
for interdependent behavior in the timing of violence attacks would constitute a 
necessary first step to begin exploring these dynamics more deeply, disaggregating 
data further (e.g., spatially, by the type of attack, armed group, and/or conflict type) 
and exploring—theoretically and empirically— potential mechanisms. 

This paper is structured as follows. In the next section, we present the empirical 
data that we used and the three cases that we chose to begin exploring whether 
NSAGs influence each other in attack timing and frequency. In Sect. 3 we introduce 
the methods detailing the fundamentals of both the analytical estimation from the 
empirical data and the generative model and simulation. Section 4 presents the 
main results and, to conclude, Sect. 5 briefly discusses the results, identifying some 
limitations of this study and delineating a road ahead. 


2 Data and Case Settings 


The empirical data for our study comes from the Global Terrorism Database (GTD). 
With information on over 180,000 domestic and international attacks between 
1970 and 2017, including kidnappings, assassinations, and bombings, this is the 
most comprehensive open-source database on terrorist attacks. While including 
more events than any other available dataset, it rests on clear inclusion/exclusion 
criteria. It excludes criminal incidents devoid of political or ideological motivations, 
incidents arising from clashes between opposing armed groups, and incidents 
perpetrated by the state. In this sense, it allows us to focus specifically on the 
outcome we are interested in: one-sided violent attacks by NSAGs.” 

In addition, we use data on ties between armed organizations. These data come 
from Phillips (2015)? which is an extension of the Big, Allied, and Dangerous 
(BAAD) dataset (Asal and Rethemeyer, 2015). These ties are manually curated 
based on the Terrorism Knowledge Database, media reports, and legal documents 
and represent a variety of known incidents of activity between two groups, such as 
training another group's members, providing safe harbor, or collaborating on attacks 


?For a detailed description of the data, including the history behind its collection, digitization, 
and consolidation process, see LaFree and Dugan (2007). For recent updates and to access the 
codebook and download the data, visit https://www.start.umd.edu/gtd/using- gtd/. 


We are grateful to Phillips for sharing these data with us. 
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together. In this work, we treat the ties between organizations in an undirected 
manner (i.e., groups in the relationships are coded regardless of who is addressing 
whom). 

For this exploratory analysis, we use three conflicts as our case studies— 
Afghanistan, Colombia, and Iraq—and restrict the period to 2001-2005. Multiple 
armed groups were active in each of these conflicts for the period under study, 
all executed a considerable number of violent attacks, and at least one actual tie 
was observed between two of them. In addition, these three conflicts provide us 
with variation in dimensions that can prove theoretically relevant: e.g., the macro- 
cleavage and the stage of the conflict, as well as the profile of the armed group. 
First, Colombia constitutes a clear example of an irregular, guerrilla type of war that 
started in the mid-1960s (four decades before our time frame) and pits left-wing 
rebel groups with Marxist-Leninist ideals against right-wing paramilitaries and the 
forces of the state. Then, Afghanistan, starting in 2001 (the first year of our time 
frame), shortly after the 9/11 attacks. Here a coalition of international forces and a 
new Afghan government faced a local insurgency and multiple armed groups, with 
fundamentalist, sectarian, and sometimes ethnicity-driven characteristics. Finally, 
the war in Iraq started in 2003 and ended the Baathist regime in Baghdad. Here, 
like in Afghanistan, US-led forces were faced with a large-scale insurgency and 
a number of terrorist organizations targeting the international military presence, 
civilian population, and aiming to disrupt the ongoing nation-building efforts. 

For each of these countries, we carefully cleaned the data to make sure it included 
a meaningful and internally valid set of NSAGs and ties between them. Based on 
historical records, secondary sources and first-hand knowledge of each of these 
conflicts, we excluded organizations that we knew were not operating in the same 
environment and/or were mostly inactive during the time frame under analysis." 
Additionally, we merged observations when the same organization was coded as two 
or three different ones given the use of various names in the sources used to build 
the original dataset. This process considerably reduced the number of organizations 
we are working with and explains the difference in the number of observations (both 
organizations and attacks) relative to GTD and Phillips (2014). 

Finally, to account for variation in NSAGs’ capacity to execute violent attacks, 
and inform the parameters in our simulated data with empirical priors (see Sect. 3.2), 
we estimated the capacity to commit a violent attack for each armed group that 
made it into our final list. Given that the measurement of our outcome of interest 
directly involves the number of attacks by each organization, we did not use this 
information to estimate capacity. Therefore, we relied on two different variables to 


^For a more detailed description of these data, please refer to the source papers (Phillips, 2014, 
2015; Asal and Rethemeyer, 2008). 

>For example, the IRA was originally included in the data for Colombia from 2001 to 2005. 

For example, we merged groups which emanated from name-changes during the conflict and were 
a continuation of a group and its activities. During the transition, names of these groups were used 
interchangeably by international news outlets. 
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Table 1 Descriptives Colombia | Iraq | Afghanistan 


Armed groups 9 31 8 
Actual ties 4 5 2 
Environmental ties | 32 460 26 
Total ties 36 465 28 
Attacks 428 181 | 428 


proxy capacity, both from the GTD data: the number of attack types that each group 
has engaged in and the number of deaths caused by their attacks.’ 

Table 1 summarizes the basic data for the three cases under study after we 
construct a fully connected network for each country. 


3 Methods 


To better understand the potential impact of all NSAGs’ actions within a country 
on the timing of the execution of attacks by specific NSAGs, we approached the 
problem with two distinct methodological strategies. The first consists of using 
empirical data to estimate the extent to which any NSAG’s action is reactive to any 
and all other NSAGs’ actions within the same environment. The second consists of 
constructing a generative model that implements a basic model of action to explore 
how much of an impact environmental ties could have and still produce an attack 
time series that is statistically indistinguishable from the actual historical record. 
These two tracks combined allow for an exploration of any evidence for latent 
influence of NSAG’s actions on another's actions in the same setting.? 


3.1 Analytical Estimation 


From the empirical data, we analytically estimated possible latent influences from 
one NSAG’s attacks on the timing to another NSAG with a Hawkes process 
(Hawkes, 1971). A Hawkes process is a self-exciting point process that takes the 
form of 


7When combining these two variables, we did not use any weighting as we did not have any 
theoretical reason to believe that any of the two variables should matter more than the other as 
proxies of NSAG’ capacity to commit an attack. We also considered a large set of variables that 
provide a good sense of capacity to commit an attack from the Non-State Actors in Armed Conflict 
Dataset (NSA) compiled by Cunningham et al. (2013). However, as we found that many of the 
relevant variables in this dataset were highly correlated for the organizations we were working 
with, we decided to use weights only from the GTD. 


8 All data and code is available at https://github.com/adamrpah/BIGSSS-Terror. 
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where Hg is the base rate of “attacking” for a given NSAG; g, or, is the additive 
effect that a prior attack has; 85 is how quickly the additive effect from a prior attack 
decays; tg; is how long ago the attack was carried out; and AS is the resultant rate 
for NSAG g at time f. Without the additive term that accounts for the influence 
of prior attacks on the future attack probability, a Hawkes process reduces to a 
Poisson process. Empirically A (t) is the attack rate for an individual group during 
the interval of interest. 

The mathematical extension of a Hawkes process from a single NSAG reacting 
to its own past actions (univariate Hawkes) to a NSAG incorporating signals from 
non-self actors (multivariate Hawkes) is relatively straightforward. We model the 
multivariate Hawkes process as 


ALGO m Rat. M age PCa) Q) 
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where the impact of each attack tg, is integrated with an individual œg, for each 
NSAG gx when we calculate the rate A* for a single NSAG g; at time t. We use 
the pyhawkes package to estimate the parameters for this function, which uses 
Markov Chain Monte Carlo (MCMC) for the estimation (Linderman and Adams, 
2014). 

A difficulty with a MCMC approach is with identifying when the parameters 
have converged. To ensure that our parameter estimation is robust we use the scale 
reduction factor (Rubin and Gelman, 1992) and continue sampling on all chains 
until the variation is less than 0.01 on all jz parameters. 

The only empirical information that the multivariate Hawkes process requires 
is the attack time series for each NSAG within a setting. Since all parameters are 
estimated, there is no reason for the œ parameter that controls how much a NSAG 
reacts to the recent attacks of another NSAG to be larger (or smaller) for actual than 
environmental ties. 


3.2 Generative Model and Simulation 


Our generative model implements a modified form of the multivariate Hawkes 
process (Equation 2) in order to theoretically explore the possible amount of 
influence of environmental information (attacks by NSAGs with no actual ties) on 
a NSAG's future attacks in comparison to the influence of the attacks of known 
NSAGs with which they have an actual tie. Our modified version of the multivariate 
Hawkes process for the generative model is 


Mg Mat D, D, waart en, 3) 
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Table 2 Parameters of the ABM 


Parameter Range Description 

a (0.05, 0.1, 0.15, ..., 0.9] Increase of intensity of impact from an attack 
p (4, 4.5, 5,..., 8} Decay rate of attack intensity 

w {0, 0.1, 0.2, ..., 1} Weight of environmental ties 


where œ (the additive impact of past attacks), (the decay of influence over 
time), and w (the edge weight between two NSAGs) are general parameters of the 
simulation (see Table 2 for simulation parameter ranges). 

With Eq. 3, we simulate the multivariate Hawkes process for the set G of NSAGs 
in the country of interest. Unlike the analytical estimation, here we specify that all 
NSAGs in G are connected in a fully connected network with weight wg, for each 
tie that NSAG g; has. We set wg, = 1 when g; = gy or there is an actual tie 
between NSAG g; and gg from prior work (Phillips, 2015). If there is no actual tie 
between NSAG g; and gg then we set wg, = c. Since the magnitude of wg, is fixed 
at 1 for actual ties, this allows us to systematically simulate the potential impact of 
environmental ties on the timing of a NSAG’s attacks. 

To restrict the potential parameter space, we estimate j1g for each NSAG from 
the empirical data. Since jz, is the inherent rate of attacking for a NSAG it could 
be, roughly, estimated from the attack time series directly. However, we do not use a 
NSAG's empirical attack rate so that no direct information about the attacking rate 
is incorporated into the simulation. Instead, we regress the number of attack types 
that a group engages in against the log total of casualties that the NSAG inflicts as 
a proxy for a NSAG's capability to successfully launch attacks. To operationalize 
this relationship as a rate (145) for each group, we scale the calculated value for each 
NSAG by the average number of attacks per casualty across all NSAGs and reduce 
that rate by 2096 to account for possible latent influences.? 

We simulate each run of our model for steps t € [0.. 1865], where each step 
is a day to match the time span in the empirical setting. At each time step ft, we 
enumerate all attacks for each NSAG up to time f and recalculate re for each group. 
For each group, we make a random draw from a Poisson distribution with T as the 
mean and record that random value as the number of attacks at time step t. 

To assess whether a set of simulation parameters generate realistic time series 
for a setting, we perform 500 independent runs for each unique set of simulation 
parameter values. To statistically test if the simulated data differ from the empirical 


?The rate reduction of 20% is an arbitrary constant to ensure that jt g * AS. The exact percentage 
chosen is not important since we are performing a parameter sweep with the simulation. If there 
is an effect from the attacks of others on the timing of a NSAG's attacks, then a rate reduction 
that is too small would simply increase the magnitude of the o parameter (and vice versa if 
the rate reduction was too large). If there is not a systematic effect from the attacks of others, 
then the landscape of feasible parameter combinations should be rugged (i.e., isolated parameter 
combinations that duplicate the empirical time series) which would be an artifact of the rate 
reduction chosen. 
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data, we test if the inter-event time distribution for each NSAG differs from the 
empirical inter-event time distribution with the Kolmogorov-Smirnov test. Across 
all runs for a simulation parameter set with a typical threshold of p — 0.05, we 
would expect 5% of the inter-attack time series to differ by chance alone. Thus, if 
more than 5% of the inter-event series are significantly different, we conclude that 
the given simulation parameter set fails to reproduce the empirical data. 


4 Results 


4.1 Analytical Estimation of Basal and Additive Rates 


After estimating Eq. (2) from the empirical data, we can establish the baseline 
differences between individual NSAGs. Primarily, we aim to further clarify how u, 
which represents an individual NSAG’s basal rate and capacity to attack, compares 
to the additive component o, the reactionary attacks in response to attacks from 
others. 

We find a dramatic difference between the magnitude and proportionality of o 
and u in our case studies (Fig. 1). In Colombia, the additive effect of another group's 
attack on a group's own attacking rate is small in comparison to its basal attack rate. 
FARC stands as a stark outlier in comparison to all groups, with its u more than two 
orders of magnitude larger than the additive o effect. This would suggest that the 
timing of FARC attacks has almost no relationship or influence from prior attacks 
by any group within the region. 

In contrast, in both Iraq and Afghanistan the additive effect o is larger than jz for 
a majority of NSAGs—with multiple groups having an o value that is two or nearly 
three magnitudes larger than u. Under the assumptions of the Hawkes process, this 
means that the timing for the majority of attacks that NSAGs commit are more in 
response to the actions of other NSAGs in the region than an independently timed 
attack. 


Afghanistan 


107 107 107 10 10 1075 107? 10? tot 10 io 107 102 107 10 
H H H 


Fig. 1 Comparison of the estimated u for each NSAG against the œ coefficient it has for every 
other NSAG in the same country. The plotted line is for y = o 
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Fig. 2 Comparison of the 0.08 
calculated prior u to the 
analytically estimated yz for 
NSAGs in all three countries 
0.06 


0.04 


0.02 


Analytical Estimation 


Prior 


However, if we examine the difference between actual and environmental ties 
amongst these three settings, we find no significant differences (p — 0.24, t-test). If 
we exclude all ties that originate from Colombia, this result still holds (p — 0.41, 
t-test). This suggests that despite the differences found in the settings in terms of 
the impact of the additive effect o, whether a tie is actual or environmental is not a 
major contributing factor. 

As à check for the initialization parameters for the generative model, we also 
compare the analytically estimated u values with the estimated u priors for the 
generative model and find a good agreement overall (Fig.2, R? = 0.49). The 
notable outlier is again FARC, which has an analytically estimated value that 
is approximately 150% of its calculated prior. This suggests that the estimation 
strategy is a good, general approximation of a NSAG’s u without the usage of 
empirical rate-related information. 


42 Comparison of Inferred Networks to the Network of Actual 
Ties 


To better understand the implications of the inferred o parameters on the entire 
network of relationships within a country, we explicitly construct an inferred 
network where o serves as tie strength and compare it to the network of actual 
ties amongst NSAGs (Fig. 3).!° 


10We set a threshold on the strength of o for creating a connection at a > 1073. 
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Fig. 3 Actual and inferred networks. Circles indicate NSAGs and lines indicate ties. Lines colored 
in red indicate the ties present in both the actual and inferred networks 


As we would expect from the previous results, the inferred networks for 
Afghanistan and Iraq are densely connected. Interestingly, we infer no relationship 
for almost half of the actual ties in Iraq and Colombia, while we capture both of the 
actual ties in Afghanistan. These results suggest that groups react to a number of 
other NSAGs within the same country that are not captured through formal known 
ties and that not all formal known ties are equivalent in nature. 

To better quantify how different the inferred network is in each country, we 
compute the normalized degree and transitivity statistics (Table 3). The normalized 
degree accounts for the number of ties that a NSAG has given the number of possible 
ties in the network, which allows for a comparison between countries. We find that 
in Colombia the normalized degree grows 150% from the network of actual ties to 
the inferred network, while the growth in degree in Afghanistan and Iraq is over 
100096. This again highlights the difference in countries, with the inferred network 
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Table 3 Descriptive 


i Average normalized degree | Transitivity 
statistics for observed and 


i Actual | Inferred Actual | Inferred 
estimated networks - - g B 
Afghanistan | 0.07 0.93 0.00 0.93 
Colombia 0.40 0.60 0.60 1.00 
Iraq 0.01 0.98 0.50 0.98 


of groups that react to one another barely differing from the known network in 
Colombia and these two networks drastically diverging in the emerging conflicts in 
Afghanistan and Iraq. 

Transitivity measures the amount of triadic closures, that is, if A is connected 
to B and B is connected to C, then A is also connected C. In this context, it would 
imply that if group A reacts to actions taken by group B, then it would also react to 
actions taken by group C so long as group B also reacts to C. A larger transitivity 
value in this context would imply that the total volume of attacks in a country is 
more dynamic since an attack made by one group could start a chain of reactionary 
attacks from its connected neighbors (given a sufficiently large o). Surprisingly, 
despite the difference in transitivity for the actual ties networks, all three countries 
have similar levels of transitivity—meaning that for groups that we have inferred 
ties for, those ties are closed into triads. In terms of the change from the actual ties 
network to the inferred ties network, the increase is most notable in Afghanistan. In 
the observed network, only the Taliban is connected to Jaysh al-Muslimin and Al- 
Qaeda producing a transitivity of 0, while the inferred network transitivity is 0.93. In 
Colombia, the magnitude change in transitivity is not as dramatic as in Afghanistan, 
but it is notable that FARC is isolated from the inferred network which is a contrast 
to its known ties to the ELN (the second largest left-wing rebel group). 

These statistics correspond to the stark visual discrepancies between the actual 
and inferred network structures. Despite the fact that the actual ties are records of 
known, shared activities (whether that be training recruits, planning attacks, etc.), 
we find that the actual network lacks the majority of ties, and the resulting structure, 
identified in the inferred network. Given our focus on the timing of attacks, this 
suggests that the “strategic” actual ties do not capture the range of actors and actions 
that a NSAG may take into account when choosing exactly when to execute an 
attack. 


4.3 Generative Model Results and Correspondence to 
Analytical Findings 


To better interpret our generative model, we plot our sweep through the three 
parameter ranges for each country, focusing on the parameter region where the 
model successfully replicates the empirical data (Fig.4). Once again, Colombia 
stands out since no parameter combination was able to produce NSAGs time series 
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Fig. 4 Generative model results for defined parameter combinations (o, B, œ) in Afghanistan, 
Colombia, and Iraq. Each cell is the result of comparing the inter-event time series for 500 
generative runs against the empirical data with the Kolmogorov-Smirnov (K-S) test. If less than 
5% of the runs differ from the empirical data, the cell is marked as ‘Pass,’ otherwise, it is marked 
as ‘Fail.’ Each grid in a column is a different o value (the additive effect from previous attacks), 
while f (the rate of decay of previous attack effects) on the horizontal axis of each graph. The 
vertical axis for each graph is w, which is the relative strength of environmental to actual ties 


similar to the empirical data. This is in general agreement with the analytical 
findings, since FARC has a y value that is nearly three orders of magnitude larger 
than any additive effect from another NSAG’s actions and conducted more than 
5096 of the attacks during the study period. Even the ELN, the second most active 
rebel/insurgent group and the one that conducted nearly 20% of the attacks during 
the period of analysis, has an estimated u value that is roughly one order of 
magnitude larger than the additive effect from any other NSAG’s actions. 

The results for Afghanistan and Iraq present a different picture, as both cases 
yielded model parameter combinations that successfully reproduce the empirical 
data. Since there are only two successful combinations in Afghanistan, the interpre- 
tation is straightforward: only when groups react, nearly, as much to the actions 
of other groups as they do to the actions of their allied groups is the empirical 
data reproduced. Iraq is similar to Afghanistan in so much as there must be a 
high additive amount to the rate from each attack in order for the simulation 
to statistically reproduce the empirical data. There is an expected dependence 
on the strength of environmental ties—as this link becomes stronger the range 
of permissible o values grows, to the point that nearly half of the parameter 
combinations are successful when a = 0.5. The primary difference is that in this 


180 S. Cremaschi et al. 


case not even a weak dependence on how quickly the impact of prior attacks decays 
seems to play a role. This could be due to the large number of NSAGs operating 
in this environment and the limited number of attacks that each group committed 
during the period of analysis (31 NSAGs operated during this period, committing an 
average of 5.84 attacks). However, the results for both Afghanistan and Iraq confirm 
the analytical estimates, with both countries having average o values that are larger 
than the average jz, suggesting that reactionary attacks play a role in the timing of 
attacks in these two countries. 


5 Conclusion 


Our preliminary answer to the guiding question of “does the behavior of one NSAG 
affect the behavior of other armed groups operating in the same environment?" is 
nuanced. We find clear support in both Iraq and Afghanistan that the actions of other 
groups affect the timing of attacks; however, no support is found for this notion in 
Colombia. This mixed result with only three case studies prevents us from drawing 
any general conclusions on the notion that previous attacks from other groups affect 
the future timing, and thus frequency, of NSAG attacks within a region. 

Quantitatively, it is necessary to expand the analysis to a larger sample with more, 
1f not all, countries with active NSAGs. There are recorded attacks in over 200 
countries (without disambiguation for geographical renaming) in the GTD, which 
would provide enough statistical power to test this hypothesis exhaustively. If the 
effect were general across countries, then greater insight could be obtained from the 
estimated magnitude of the impact that prior actions have on the future timing of 
attacks. 

Qualitatively, our initial analysis would suggest that the general stage of the 
conflict that NSAGs are engaged in is a contributing factor as to whether or not prior 
attacks influence future attack timings. During the study period, both Afghanistan 
and Iraq were severely destabilized with the large-scale introduction of foreign 
forces and joint political instability. In both settings, this resulted in constantly 
changing circumstances as insurgents, global jihadi movements, foreign powers, 
and counterinsurgent forces. In contrast, the conflict in Colombia is one that has had 
the same dominant actors for years before the study period. This is demonstrated 
by the fact that between 1980 and 2002, hijacking, hostage taking, and kidnapping 
totaled 76.396 of attacks conducted by FARC, the most dominant actor in Colombia 
(Eccarius-Kelly, 2012). On the contrary, hostage taking constitutes only 7.296 of 
attacks conducted by the Taliban, the most dominant actor in Afghanistan. These 
differences in attack type underscore the qualitative difference in the nature of the 
conflicts we study here. 

Importantly, we find no difference in the relative importance of actual ties to 
environmental ties. If we examine the results from the analytical model, the mean tie 
strength does not differ between actual and environmental ties. From the generative 
model, we find that only as the strength of the additive effect increases can we 
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successfully replicate the empirical data. In Afghanistan, successful reproductions 
only occur when environmental ties are weighted nearly as much as actual ties. In 
Iraq, the space of permissible w values increases as the additive effect o, increases— 
since an increase in the magnitude of the effect compensates for the weakening tie 
strength. While alliances and rivalries, the basis of our actual ties, have been shown 
to have long-term effects on group outcomes (Phillips, 2015; Asal and Rethemeyer, 
2008), there does not appear to be any relationship to a short-term effect, specifically 
one on the timing of attacks. While the actions of cooperation, one group training 
another group or coordinating attacks, does relate to the timing of individual attacks, 
it does not appear to rise to a systematic level of how attacks are coordinated and 
executed. 

Without expanding the scale of the study, it is not possible to discern if this 
study is simply further proof of the empirical limits on predicting the timing of 
NSAGs’ attacks (as demonstrated by the failure of the approach in Colombia) or if 
itis only in specific contexts that prior actions can inform future predictions. We can, 
at least, state that while formal relationships and the position that a group occupies 
within that formal structure has a long-term impact, these relationships do not have 
a systematic impact on daily activities and their timing. 
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Abstract Why do some conflict zones exhibit more violence against civilians 
than others? In answering this question, the literature has emphasized ethnic 
fractionalization, territorial control and strategic incentives, while overlooking 
the consequences of armed conflict itself. This oversight is partly due to the 
methodological hurdles of finding an appropriate counterfactual for observed battle 
events. In this chapter, we aim to test empirically the effect of instances of armed 
clashes between rebels and the government in civil wars on violence against 
civilians. Battles between belligerents may create conditions that lead to surges 
in civilian killings as combatants seek to consolidate civilian control or inflict 
punishment against populations residing near areas of contestation. Since there is 
no relevant counterfactual for these battles, we utilize road networks to help build a 
synthetic risk-set of plausible locations for conflict. Road networks are crucial for 
the logistical operations of a civil war and are thus the main conduit for conflict 
diffusion. As such, the majority of battles should take place in the proximity of 
road networks; by simulating events in the same geographic area, we are able to 
better approximate locations where battles hypothetically could have occurred but 
did not. We test this simulation approach using a case study of the Democratic 
Republic of the Congo (1998-2000) and model the causal effect of battles using 
a spatially disaggregated framework. This work contributes both substantively and 
methodologically to the literature on micro-foundations of civil war and reactive 
violence in two main ways: (1) It offers a tentative framework for crafting synthetic 
counterfactuals with event data. (2) It proposes an empirical test for explaining the 
variation of violence against civilians as a result of battle events. 
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1 Introduction 


The consequences of violence against civilians (VAC) within the context of civil 
wars continue to be a debated topic with, at times, contradictory findings. Some 
argue that the use of violence against civilians is counterproductive to incumbents’ 
goals (Kalyvas 2006; Kocher et al. 2011; Lyall 2017) while others find the 
opposite (Lyall 2009; Stoll 1993). Departing from this debate, a growing literature 
has sought to better understand VAC as a dependent rather than independent 
variable. Consequently, numerous studies have examined factors that may explain 
VAC, including ethnic fractionalization, territorial control, strategic incentives, and 
various geographic variables (Fjelde and Hultman 2014; Schwartz and Straus 2018; 
Raleigh 2012; Wood 2010; Schutte 2015, 2017). This chapter aims to contribute 
to this body of literature by using a geographic event-based approach to further 
investigate the occurrence of violence against civilians. 

Building on theories of VAC being used as a tactic by warring parties, we 
proceed to ask the following question: what effect do instances of armed conflict 
actually have on violence against civilians? We suspect that geographical factors, 
specifically road networks, are crucial for the logistical operations of a civil 
war, thus making areas around these road networks more prone to conflict in 
general. Are actual, observed battles between incumbents and insurgents causing 
the variation in violence against civilians or does indiscriminate violence simply 
occur in these more conflict-prone areas, even where battles do not necessarily take 
place? By addressing this question using georeferenced micro-level data, we hope to 
increase our understanding about the relationship between conflict waged between 
combatants and violence experienced by civilians. 

This chapter begins with a brief overview of the literature that is both theoret- 
ically and methodologically relevant to the present study. Next, we offer our new 
approach for achieving causal identification in spatial models by using simulated 
conflict events around road networks. Finally, we present results to demonstrate 
the feasibility of our approach and then conclude with recommendations for future 
researchers similarly seeking to use simulation techniques. 


2 Conflict and Violence Against Civilians 


The existing literature has not reached a consensus on whether armed conflict has 
any direct effect on the prevalence of violence against civilians. Sullivan (2012) 
found evidence that insurgent violence may increase the probability of massacres 
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carried out by the state in order to inflict punishment and remove insurgent threats. 
Fjelde and Hultman (2014) showed that warring parties are more likely to inflict 
violence against civilians in geographical areas inhabited by the enemy's "ethnic 
constituency" for a similar reason: to undermine and weaken the enemy and their 
potential ethnic support base (p. 1233). This follows the findings of Valentino 
et al. (2004) demonstrating that, particularly in guerrilla wars, combatants target 
civilians in order to increase their own control and reduce collaboration between 
local populations and their adversary. The role of rebel strength and capacity has 
also been investigated with results suggesting that weaker insurgents are more likely 
to engage in violence against civilians as a means to raise the enemy's cost of 
fighting (Hultman 2007) or to compel support from the population (Wood 2010). On 
the whole, existing theories regarding the relationship between conflict and violence 
against civilians are largely driven by military strategy, whether that be reducing the 
enemy's base of support, coercing one's own support, or inflicting fighting costs to 
achieve concessions. 

While this literature offers compelling theoretical contributions with regard to 
the relationship between conflict and VAC, only recently have empirical studies 
thoughtfully examined the occurrence of armed conflict as a main independent 
variable. The introduction of georeferenced event-level data has been central in this 
emerging research agenda. Most notably, Raleigh (2012) utilized a time-location- 
actor-action model using event data from the Armed Conflict Location & Event 
Data Project (ACLED) and found a lack of co-occurrence of armed conflict events 
and instances of violence against civilians in time and space. Instead, a pattern of 
VAC events emerged around areas occupied by several active groups, suggesting 
that violence against civilians is not, in fact, "a strategy to gain civilian support or 
punish civilians" but rather a strategy more often used as a means for competition 
among violent actors (Raleigh 2012, p. 478). That these results run counter to the 
prevailing discourse highlights the importance of using event-level data to study the 
effect of armed conflict occurrence on VAC. 

Previous micro-level empirical studies investigating the relationship between 
conflict and civilian victimization have taken varied approaches to causal identifi- 
cation. In some works, the amount of spatial-temporal correlation between conflict 
events and VAC is taken as evidence for or against a theory of strategic civilian 
targeting in war (e.g., Raleigh 2012). However, since the location and timing 
of conflict occurrence is likely to be driven by strategic behavior related to the 
civilian population, there is a need to consider whether it is (a) armed clashes 
that are themselves causing civilian targeting or (b) the underlying conditions 
that provoke armed clashes that are simultaneously driving levels of violence 
against civilians. Other studies have addressed similar considerations in the case 
of indiscriminate violence more broadly (Lyall 2009, 2017; Schutte 2015). In 
seeking to determine the effect of indiscriminate counterinsurgent artillery fire on 
subsequent insurgent attacks, Lyall (2009) used a matching technique to compare 
shelled villages ("treatment") and non-shelled villages ("control") with difference- 
in-difference estimation. Additionally, Schutte (2015) employed a matched wake 
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analysis (Schutte and Donnay 2014) on spatio-temporal event data to test the 
"treatment" of indiscriminate violence on civilian collaboration with the incumbent. 
These studies used improved techniques for causal identification. However, as we 
describe in the following section, this literature could benefit from designs based 
around the creation of simulated battle events as a relevant counterfactual to actual 
occurrence of violence. 


3 A New Strategy for Causal Identification: Creating 
Synthetic Events on the “Beaten Path" 


As the foregoing discussion has made clear, existing empirical studies have 
produced both uncertain and contradictory findings on the effect of armed conflict 
on civilian victimization. In part, this is a byproduct of the challenges of achieving 
causal identification using observational data on conflict events. The co-occurrence 
of armed conflict and violence against civilians in spatial-temporal windows (or lack 
thereof) cannot provide definitive evidence of a truly causal relationship. 

For a more accurate picture, it is necessary to identify and compare a set 
of counterfactual events against actual observed cases of conflict beyond simple 
matching or shallow definitions of "plausible areas". These "control" observations 
would offer an idea of what levels of civilian victimization we should expect in 
areas that were likely to be sites of conflict, but ultimately did not see any actual 
battles. In effect, this will allow us to isolate the true effect of the conflict events 
themselves, while partitioning out the unobservable or unmeasurable variables that 
are inherently correlated with both the locations of battle events and the likelihood 
of violence against civilians—such as a location's strategic military importance or 
its pre- and intra-war social networks. A plethora of studies have applied the logic 
of simulating conflict events as a way to create control observations with promising 
results (Lyall 2009; Kocher et al. 2011). 

However, the way in which we ought to determine where these hypothetical 
control events should be located remains up for debate. To study the effect of 
counterinsurgent violence on rebel responses, Lyall (2009) considers all Chechen 
villages as plausible control points; similarly, Kocher et al. (2011) examine all 
hamlets (i.e., small settlements) within the Republic of Vietnam during the Vietnam 
War. Yet, these approaches make a crucial, unstated assumption about where we 
should expect to see conflict. In each strategy, it is assumed that battle events can 
occur in any part of the country, with the likelihood of a given location weighted by 
relevant spatial covariates. 

This assumption has been strained by recent research into the spatial diffusion 
of conflict. In particular, scholars have identified the significance of road networks 
in determining where conflict occurs (Zhukov 2012). The availability and quality 
of roads are among the most crucial logistical constraints for state militaries and 
insurgents looking to sustain and grow their operations. Moreover, the strategic 
importance of roads and the settlements that lie along them makes these areas 
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^ Violence against civilians in buffer 

* Battle event in buffer 

^ Violence against civilians outside buffer 
O Battle event outside buffer 


Fig. 1 Observed VAC and battle events in DRC 


a hotspot for battles between warring parties seeking to gain an upper hand. 
Accordingly, the vast majority of armed battles take place in close proximity to 
roads. Indeed, in the case study of the Democratic Republic of the Congo we 
describe below, over 62% of all battle events occur within 5 km of a major roadway, 
despite the actual area of this space capturing less than 14% of the country's total 
land area.! As combatants move further away from the core road network, the costs 
of sustaining combat operations in more remote areas make it increasingly unlikely 
that armed actors will have the motivation or capacity to engage in violence (see 
Fig. 1). 

We argue that future efforts to simulate counterfactual conflict events should 
acknowledge the strong clustering pattern of battles around road networks. There 
are at least two substantive reasons for that. Firstly, insurgencies tend to display 
high degrees of mobility. Insurgents—and opposing forces alike—need to quickly 
move between strategic objectives in order to maximize their impact. For this reason, 
strategic infrastructure such as roads and bridges are crucial tactical elements. 
Secondly, major roads normally connect not only major settlements but also 
strategic infrastructure (ports, airports, energy plants). While data on settlements 
might offer a viable alternative given that civil wars tend to occur in more populated 


! Similarly, 70% of events involving violence against civilians occur within this same area around 
roadways. 
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areas, they leave out a plethora of other points of interest that can be determinant for 
warring parties and where civilians tend to "cluster" even when leaving settlements 
(e.g., after displacement or mass flee). Towards this end, we propose a simple 
technique for creating relevant "control" battles where none are directly available 
in the data. Our goal here is to create a set of points that represent locations where 
armed clashes are likely to occur, but have not actually taken place. Subsequent 
occurrences of violence against civilians near these hypothetical battle locations can 
then be compared to the outcomes around locations of actual battles to determine 
the true causal effect of the battles themselves. 

Our proposal involves, first, creating a buffer area around all major roadways 
in the area under study. Figure 2 suggests that the particular width of the buffer is 
unlikely to cause any significant difference in our results for widths set at less than 
10 km from the roadways. Increasing the buffer width from 1 to 10 km results in 
a substantial increase in the amount of land area captured in the buffer area while 
producing only marginal improvements in the number of true battle events captured 
in that same area. For our analysis, we set the width of the buffer at 5 km on each 
side of the road (i.e., for a total width of 10 km at any given point). 

The buffer is then stored as a vector polygon and overlaid on the overall country 
polygon using geoprocessing tools that are commonly available today.* Then, to 
create a set of simulated events, we rely on a simple point process model to randomly 
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?7]n this specific case, we made use of “gBuffer” from rgeos (Bivand et al. 2018) package for R due 
to its high customizability in creating polygons around point data. Other common tools include the 
"Buffer" function from ESRI ArcMap. 
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assign locations within this buffer polygon. This is done using a uniform Poisson 
process within the road buffer windows with intensity (i.e., points per unit area) 
equaling that of the observed events. This is, of course, an approximation as we 
assume that each location on the road network to be a candidate for a battle. 
While this is in line with our theoretical expectation, we recognize that other 
factors (e.g., population density) might make a certain area more prone to civilian 
victimization than another on the same road network. Nonetheless, this simplified 
approach better suits the aim of this paper and, as mentioned above, places emphasis 
on the importance of roads. Finally, we assign dates to these synthetic events by 
randomly sampling the dates of actual battles, in hopes of creating a similar temporal 
distribution of conflict occurrence. 

Figure 3 presents an illustration of this technique. In the upper left panel, the 
locations of observed instances of violence against civilians and battles are plotted 
in the case study area. The upper right pane adds major roadways throughout the 


© VAC event 


© VAC event p7 
A Battle event <] 


A Battle event 


* VAC event (inside buffer) 

^ Battle event (inside buffer) 
VAC event (outside buffer) 
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Fig. 3 Demonstration of buffer creation and conflict event simulation. The different shades of red 
and blue indicate the concentration of events in the same area. Darker colors correspond to areas 
where more events took place 
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country. The bottom left panel adds a 5 km buffer around those roadways and then 
excludes the observed battle and VAC events that occur outside this buffer area. This 
idea is formalized in the bottom right panel, where those remote events are dropped 
entirely and the relevant polygon becomes the road buffer area. Finally, in the same 
bottom right panel, a number of point locations are added within the buffer area to 
serve as simulated battles events for the modelling stage. 

After creating the simulated events within the buffer area and dropping observed 
events outside this polygon, we proceed to model civilian victimization by taking 
counts of these two classes of events as our predictors. Here, the number of 
simulated events act as a control group while the observed events represent a 
treatment condition. 

At this point, since our proposed approach does not fundamentally alter either the 
dependent variable or the nature of the point-location independent variables, there 
are numerous spatial models suitable for estimating the causal effect. We discuss 
our chosen modelling approach, matched wake analysis, in Sect. 5. 

Ultimately, we argue that by simulating conflict events only in close proximity to 
roadways, we are able to offer a more plausible counterfactual when assessing the 
effects of actual battle events. 


4 Data and Case Selection 


To test the method described above, we carry out an in-depth analysis of conflict 
processes in the Democratic Republic of the Congo (DRC) from 1998 to 2000—a 
period capturing the earlier portion of the Second Congo War—using data from the 
Armed Conflict Location and Event Data Project (ACLED) (Raleigh et al. 2010). 
ACLED includes spatially-tagged observations of both battle events and instances 
of violence against civilians. Each observation is coded using press reports from a 
range of local and national sources. Previous studies into the relationship between 
conflict and civilian victimization have made similar use of ACLED data (Raleigh 
2012). 

In the analysis below, we focus our attention on battle events as an independent 
variable explaining the occurrence of violence against civilians. In the ACLED data, 
battle events include any “violent interaction between two politically organized 
armed groups at a particular time and location" that may or may not result in a 
change of territorial control (ACLED 2017, p. 8). Here, we exclude what ACLED 
terms “remote violence,” or conflict events where the combatants are not physically 
present at the location of the violence as a result of using remote technologies such 
as improvised explosive devices (IEDs) or missile attacks. We removed this category 
as identifying the planned target of remote violence is not immediate due to the 
presence of collateral victims (most often civilians). Furthermore, identifying the 
perpetrator is quite an undertaking and data on this is far from complete. Violence 
against civilians is defined as “a deliberate violent act perpetrated by an organized 
armed group against unarmed non-combatants" (ACLED 2017, p. 10). For both of 
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these event types, we remain agnostic about the actors that are engaged in the battles 
or perpetrating the violence against civilians and include all cases of both rebel and 
state conflict. 

The DRC offers a useful case study to demonstrate our approach for several 
reasons. First, violence against civilians in the DRC exhibits a strong clustering 
pattern around road networks. The DRC also has a large country area, which 
helps to illustrate how small areas around roads are crucial to conflict occurrence. 
Moreover, it is a case that has experienced persistent conflict over many years, 
with an ample number of observations of both battle events and violence against 
civilians. Figure 4 shows a simple comparison between battle events and VAC 
events in the full ACLED sample (1997—2018) across the whole African continent 
with the DRC, Sudan, and Somalia having the highest levels of both measures.? 
In the DRC, the magnitude of violence peaked during our temporal window 
between 1998 and 2000, the first two years of the Second Congo War. Finally, this 
case is especially informative given the relative mobility with which the conflict 
unfolded. In particular, the first phase of the conflict was highly “road-intensive.” 
Belligerents sought to move quickly throughout the country to seize strategic 
locations, movement which frequently occurred via major roadways. 
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Fig. 4 Battle versus VAC events by country in ACLED Africa data, 1997—2018 


For an analysis based on ACLED data on South Sudan, see Kelling and Lin in the chapter 
"Analysis of Conflict Diffusion Over Continuous Space" of this volume; for an agent-based model 
simulating the conflict in Somalia, see Duffy et al. in the chapter “Rebel Group Protection Rackets: 
Simulating the Effects of Economic Support on Civil War Violence". 
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5 Modeling and Results 


In order to test the feasibility of our approach and re-evaluate the causal effect 
of armed battle events, we rely on Matched Wake Analysis (Schutte and Donnay 
2014). This modelling framework combines techniques for causal inference that 
allow us to evaluate the impact of our treatment on the dependent variable against a 
control group in a continuous temporal and spatial window. Matched Wake Analysis 
(MWA) relies on a combination of a Sliding Windows Design, Statistical Matching 
and Difference-in-Differences approach. This technique has been successfully 
applied in previous conflict-related empirical studies (e.g., Schutte 2017). 

In the MWA framework, all events are first classified as either "treatment" or 
"control". In our case, these correspond to the observed and synthetic battles, respec- 
tively. These georeferenced data are then linked to any number of geospatial covari- 
ates through nearest neighbor mapping. A balanced sample is then generated by 
matching on both the covariates and on pre-treatment trends in the dependent vari- 
able using Coarsened Exact Matching (CEM). Finally, a difference-in-differences 
design is used to estimate the treatment effect (Schutte and Donnay 2014). 

Figure 5 provides a graphical representation of this approach showing two units 
(depicted as cylinders), one with a treatment and one with a control event. The 
square in the left-side cylinder represents the occurrence of an observed battle event, 
while the triangle in the right-side cylinder represents a simulated battle. The stars 
depict single occurrences of VAC, our dependent variable, both as "prior activity" 
(before the observed or simulated battle, pictured in the lower end of each cylinder) 
and as "posterior activity" (after the battle, portrayed in the upper end of each cylin- 
der). At the bottom are the relevant spatial covariates on which units are matched. 
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Fig. 5 Demonstration of matched wake analysis. Figure from Schutte and Donnay (2014), 
reprinted with permission from the authors 
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In our study, we estimate the following model: 
Npost = Bo + BiNpre + Po observed battles + u 


Here, post represents our dependent variable: a count of observed instances of 
post treatment VAC. f» represents the average treatment effect of observed battles, 
while £; is the coefficient associated to ripe, which accounts for the effect of pre- 
treatment levels of violence against civilians. As discussed above, CEM matches 
samples on the pre-treatment trends in VAC and on other spatial covariates. These 
control variables have been selected in accordance with the relevant literature on 
civil conflict discussed above. We include data on population (2000) from Gridded 
Population of the World (GPW4 2018) and the number of ethnic groups in the 
area from GeoEPR (Wucherpfennig et al. 2011; Vogt et al. 2015). Furthermore, 
we compute the distance from the capital city and the elevation’ associated to each 
point. 

Figure 6 depicts the estimated values of 85—the average treatment effect of 
observed battles—for each space and time window.? We set the spatial window as 
a range from 0 to 50 km and the time window as a range from 0 to 50 days. These 
intervals were chosen in order to capture the effect of battles in a fairly immediate 
time frame and within a “local” spatial domain. Larger distances or temporal periods 
would begin to introduce the possibility that variables that are either included in our 
model or impossible to measure will begin to bias our results. In the 50 day-50 km 
design, we are still able to test our model over a set of spatial-temporal windows 
that can help answer our main research questions and produce important policy 
implications. 

As shown in Fig. 6, the treatment effect is positive across the whole window, 
suggesting that increases in instances of VAC occurred after observed battles took 
place, as compared to our synthetic battles between belligerents. While this is not 
surprising from a theoretical standpoint, the distribution of statistically significant 
effects reveals where we can be more confident about the effect of battle events. 

In particular, Fig. 6 reveals a positive and significant effect that depends equally 
on time and distance from the battle event. The earliest period we see a significant 
effect on VAC is in areas relatively nearby the battle event. Crucially, however, the 
effect of battles increases as the distance from the battle location grows. At shorter 
distances from battles (from 0 to 22.5 km), VAC is likely to increase between 
roughly 22.5—37.5 days after a battle event. As the distance from the treatment 
increases (i.e., the interval between 22.5 and 42.5 km), we see an increase in the 
temporal range in which we would expect to see an increase in VAC (roughly 27.5— 
42.5 days). After 42.5 km, all time-intervals from 27.5 to 50 days are significant. 


^Computed and subset from terrain data by USGS and NGA: GMTED2010 (Danielson and Gesch 
2011). 

>The precise values of each cell, along with their associated p-values can be found in Table 1 at the 
end of this document. 
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Fig. 6 Results of the MWA analysis of VAC in DRC (1998-2000). The contour plot represents 
the effect of observed battles on the occurrence of violence against civilians (with the scale on the 
right indicating the magnitude of the effect). The area not covered with crossed lines correspond 
to p < 0.05 


This positive correlation between the timing and distance of the effect accords 
with an intuitive story of conflict diffusion. The results suggest that, after a battle 
event occurs, belligerents do not engage in VAC in the immediate aftermath. As the 
existing literature suggests, after taking control over a battle area combatants tend 
to invest resources in reinforcing their positions rather than seeking out civilians for 
immediate retribution. Conversely, the losing side will often either retreat entirely— 
thus not being able to engage in VAC—or adopt a ‘wait and see’ behavior to 
determine the stability of the new local power arrangement. Therefore, it is only 
after roughly 25 days that we see a surge in civilian victimization and only in an area 
fairly close to the battle location. The belligerents are most likely starting to secure 
the area with “‘pseudo-policing’ operations to tighten their clutch on the territory. 
Quite naturally, as the days pass, violence then spreads spatially as combatants seek 
to expand their base of control to surrounding populations. It is only at this point 
that we begin to detect a statistically significant effect in the areas more than 40 km 
from the battle location. 

As for the magnitude of these effects, the average over all significant combina- 
tions is 0.21 (Table 1). That is, when comparing the treatment and control groups 
on average, for every battle event, we should expect to observe 0.21 additional 
instances of violence against civilians in the particular significant spatial-temporal 
areas identified above. While actual battles seem to yield more subsequent civilian 
victimization, the buffer space represents an area of high-risk, where most of the 
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clashes between belligerents and VAC events take place. The moderate, yet positive, 
effect of the treatment as compared to the control group shows that the synthetic 
counterfactuals are “plausible” candidate locations for conflict, thus confirming our 
prior expectations. 


Table 1 Combinations of temporal and spatial areas showing the associated cut-points in days 
and kilometers 


Adjusted 

Time (days) Distance (km) Effect size P-value R-squared 
25 5 0.189 <0.05 0.0313 
25 10 0.202 <0.05 0.0352 
25 15 0.195 <0.05 0.0335 
25 20 0.201 <0.05 0.0337 
30 5 0.199 <0.05 0.0336 
30 10 0.212 <0.05 0.0379 
30 15 0.204 <0.05 0.0357 
30 20 0.21 <0.05 0.036 
30 25 0.211 <0.05 0.0363 
30 30 0.224 <0.05 0.0394 
30 35 0.224 <0.05 0.0394 
30 40 0.237 <0.05 0.0435 
30 45 0.254 <0.05 0.049 
30 50 0.356 <0.05 0.0583 
35 5 0.207 <0.05 0.035 
35 10 0.221 <0.05 0.0394 
35 15 0.213 <0.05 0.0374 
35 20 0.219 <0.05 0.0376 
35 25 0.222 <0.05 0.0382 
35 30 0.235 <0.05 0.0415 
35 35 0.235 <0.05 0.0415 
35 40 0.249 <0.05 0.0458 
35 45 0.257 <0.05 0.0488 
35 50 0.356 <0.05 0.0588 
40 25 0.2 <0.05 0.0436 
40 30 0.211 <0.05 0.0466 
40 35 0.211 <0.05 0.0466 
40 40 0.232 <0.05 0.0547 
40 45 0.24 <0.05 0.0586 
40 50 0.344 <0.05 0.0638 
45 45 0.215 <0.05 0.0779 
45 50 0.321 <0.05 0.0697 
50 45 0.235 <0.05 0.0835 
50 50 0.274 <0.05 0.1002 


Each row records a single window along with the associated size of the effect, p-values, and 
R-squared 
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It is important to note that we also tested our buffer model against an “any- 
where/anytime goes" model where simulated points were generated across the 
whole country and conflict timespan. This approach yielded very different results 
in terms of statistical significance as a result of less efficient artificial counterfactual 
events. In particular, this baseline model showed an overabundance of significant 
areas across the whole spatio-temporal window with a less clear pattern in both 
space and time. This suggests that a less restrictive approach to counterfactual 
simulation could yield an overidentification of significant results, rather than the 
more nuanced picture that emerges from our analysis. 


6 Conclusion 


This chapter has offered a preliminary step towards a new method for identifying the 
causal effect of armed conflict on civilian victimization. We have proposed the use 
of simulated conflict events as a method for creating relevant counterfactual events 
in cases where we can only observe actual instances of violent clashes between 
armed groups. By leveraging the strong clustering behavior of armed conflict around 
road networks, our strategy for simulating these "control" events acknowledges the 
underlying drivers of where true conflict activity is most likely to occur. 

Using such an approach on a case study analysis of the Democratic Republic 
of the Congo, we found that armed battles tend to result in increased levels of 
violence against civilians across certain spatio-temporal windows. This increase can 
be observed in the immediate spatial proximity of the battle and reverberates across 
larger distances as well. Regarding the time component, belligerents seem to con- 
sistently engage in VAC after roughly 25 days from the occurrence of a battle event. 

While our attention has focused on the consequences of battle events, the frame- 
work presented here should extend to a broader class of conflict-related independent 
variables that tend to diffuse and cluster along road networks. Protests, military base 
establishments, and remote violence, such as missile or bomb attacks, all exhibit 
this behavior. The analysis of the causal effects of these events could similarly 
benefit from a simulation-based approach to identifying relevant counterfactual 
Observations. 

There are, of course, limitations to our approach. First, our simulation strategy 
assumes that the coders of data on battles and VAC events are correctly assigning 
those events to their exact locations. When conflict events occur in remote areas, 
it may be the case that either media sources or dataset coders elect to record the 
location of those events as a nearby settlement rather than the true coordinates. 
Normally, such a decision would represent a small amount of measurement error, 
but, in our framework, it is crucial to modelling outcomes. Since most settlements 
lie along road networks, the assignment of observations to nearest settlements could 
inflate the amount of conflict occurring within our road buffer polygons, thus biasing 
estimates of the true causal effect in these areas. Future approaches may encompass 
a "hybrid buffer" weighted on the basis on existing settlement. 
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It is unclear whether this type of error occurs in commonly used event datasets. 
As Eck (2012) has shown, these datasets, when they are created from media sources, 
tend to be biased towards greater coverage of urban areas in general. The ACLED 
coding guidelines are fairly robust against coders misassigning observations and 
the dataset does include a variable indicating the precision of the coordinates 
(ACLED 2017, p. 25-26). In future applications, this variable could be used to 
filter out low-certainty observations. However, these checks at the coder-level would 
not prevent misreporting by the media sources further upstream (i.e., non-local, 
national, and international media). Thus, our approach is not immune from the 
literature's ongoing concerns about the precision of spatial conflict data. 

Second, our proposed framework makes a strong assumption about where 
simulated events should be located. A simple implementation of our approach, as 
shown above, ensures that no simulated events will fall outside the roadway buffer 
area. A more nuanced approach, which would remain methodologically consistent 
with our general framework, could involve a probabilistic model to simulate events. 
Using this strategy, the likelihood that a simulated event is tagged at a given point 
would decline exponentially as the distance of that point from a major roadway 
increases. Here, the general principle of roadways as crucial to explaining the spatial 
distribution of conflict is maintained, while allowing for greater variance in the 
simulated locations. 

Third, by excluding battle events far from roadways (or, downweighting their 
likelihood of occurring in probabilistic simulations), we run the risk of overlooking 
a possible interaction effect between armed battle events and their distance from 
population centers. That is, there is a possibility that battles may provoke greater 
civilian victimization when those battles occur in remote areas. Combatants may 
assume that attacks on civilians will be less publicized or less likely to result 
in retribution if they are done in the hinterland. To our knowledge, this scenario 
remains untested in the existing literature, but given its relevance to the present 
research design, we recommend future studies to investigate the possibility of 
such an interaction effect. Under our current approach, excluding these remote 
observations precludes the possibility of detecting this type of causal relationship. 

With these limitations in mind, we argue that simulating counterfactual conflict 
events along road networks offers a defensible and advantageous strategy for causal 
identification in observational events data. Simply put, we want future researchers 
to understand that they can and should try to find better counterfactuals for battle 
events. Doing so will require creative thinking about where conflict is likely to 
occur. Extending this approach to other cases would also help to establish whether 
these effects are representative of civil conflict dynamics in general or are instead 
case-specific. 

In this chapter, we have proposed one solution of using road buffers as a 
simplifying condition to simulate counterfactual conflict events; infinitely more 
options also exist that could fit within our framework. We hope conflict scholars will 
continue to improve upon our approach and take up with greater zeal the potential 
offered by synthetic event simulation in conflict studies. 
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Analysis of Conflict Diffusion Over N 
Continuous Space SS 


Claire Kelling and YiJyun Lin 


Abstract This study illustrates an innovative application of methods in spatial 
statistics to study the diffusion of conflict events. We investigate how spatial pro- 
cesses of conflict events vary with different characteristics of the events and actors 
involved in the events. Actor-level attributes have often been ignored in existing 
empirical studies, which could lead to insufficient modeling of conflict processes 
and patterns. Due to recent technological and systems advances, conflict events 
can now be analyzed using data measured at the event (point) level, rather than 
relying on aggregated units. Our research contributions are twofold. First, through 
the case of South Sudan, we demonstrate how intensity and covariance functions, 
defined by the log-Gaussian Cox process model, can be used to explore the complex 
underlying diffusion mechanism under various characteristics of conflict events. 
Second, our findings add to the explanation for the process of conflict diffusion. 
Our analysis reveals that battles with territorial gains for one side tend to diffuse 
over larger distances than battles with no territorial change, and that conflicts with 
longer duration exhibit stronger spatial dependence. 


Keywords Point process - Continuous space models - Diffusion - Conflict 
duration - Spatial statistics 


1 Introduction 


Prolonged civil conflict and war as well as humanitarian crises occur frequently 
and are widespread in the African continent. This study selects the case of South 
Sudan as an example to illustrate how our methodological approach can improve 
understanding of conflict diffusion. Certainly, research findings drawn from a single 


C. Kelling (54) 
Department of Statistics, Pennsylvania State University, State College, PA, USA 
e-mail: ckelling @vt.edu 


Y. Lin 
Department of Political Science, University of Nevada, Reno, Reno, NV, USA 


© The Author(s) 2020 201 
E. Deutschmann et al. (eds.), Computational Conflict Research, 
Computational Social Sciences, https://doi.org/10.1007/978-3-030-29333-8 10 


202 C. Kelling and Y. Lin 


case study are difficult to generalize. Conflict processes in South Sudan, however, 
fit the methodological goal of this study because varieties of actors have engaged in 
different types of conflict. This could provide a rich dynamic for how conflict events 
spread around areas within the country given the capacity of actors. Moreover, 
lessons drawn from South Sudan could also shed light on the theory of conflict: 
this country's history of dependence indicates that diffusion of conflicts within a 
country can unfold even with peaceful agreement among contentious parties. 

From South Sudan's independence in July 2011 up to December 2013, President 
Salva Kiir and opposition leader and former Vice President Riek Machar have 
successfully integrated rebel groups in the national army and gave money to 
hundreds of generals and soldiers. Some of these generals and soldiers have become 
ministers at the national or local level, occupying key roles in the 35 states in which 
South Sudan is now divided. This problem is further complicated by the fact that 
there are around sixty indigenous ethnic groups in South Sudan, and that its national 
income, typically oil, has been used to buy the loyalty of these generals and soldiers. 
The consequence is that South Sudan has been trapped into prolonged civil conflicts 
and wars since 2013 (GlobalConflictTracker, 2019), and has also become one of the 
most conflict-prone countries in the world, which has gained much attention by the 
international community (CrisisGroup, 2017). The civil conflicts and wars in South 
Sudan have caused approximately four million people to be internally displaced 
within South Sudan or to flee to neighboring countries. This phenomenon further 
exacerbates regional conflicts and humanitarian crises. Humanitarian access has 
been obstructed by the increasing intensity of active hostilities or inter-communal 
violence (OCHA, 2018). 

The historical conflict process in South Sudan is, therefore, counter-intuitive for 
the following two reasons. First, it is expected to be extremely difficult to peacefully 
moderate conflicts in South Sudan due to the presence of multiple factions against 
the government and the ubiquitous nature of interconnected civil conflicts. Yet 
South Sudan was able to reach and sign a Comprehensive Peace Agreement (CPA) 
with Sudan in 2005 in attempt to end the 50 years of violent conflict. Second, 
the international community believes that working on peace deals between the 
government and opposition is the main solution to the prolonged conflict. However, 
such a peaceful agreement quickly failed to resolve local grievances and maintain 
peace, and the South Sudanese people increasingly believe that violence is a 
necessary mean to achieve peace (De Vries and Schomerus, 2017). Thus far, the 
international community is still puzzling over what policy tools can effectively 
deal with these massive conflict-driven crises. South Sudan's experience calls our 
attention because false solutions can be made if the international community fails to 
realize the fact that peace and conflict are two sides of the same coin. 

A conventional approach to building conflict theory and testing associated 
hypotheses has primarily concentrated on national-level institutional characteristics 
and state motivations or capability of going to war by using country-year aggregate 
data (Hegre and Nygard, 2015; Hegre, 2014; Maves and Braithwaite, 2013; Hoeffler, 
2012; Braithwaite, 2010). This approach has provided insights for understanding 
the persistent nature of conflict in many countries. However, this approach cannot 
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capture whether and how varying types of conflict and actors involved are spatially 
linked to each other in ways that exacerbate conflicts within a country over time. 
On the one hand, country-level institutional characteristics or aggregate economic 
conditions are often time-invariant or slow-moving factors. As a result, it is hard to 
justify the condition under which time-invariant variables might influence dynamics 
of conflict events. On the other hand, although domestic conflict events and 
the formation of rebel groups were believed to be inevitable in countries where 
governments are corrupt or political institutions are weak (Collier et al., 2009; 
Fearon and Laitin, 2003), non-state actors may have different grievances, demands, 
and means of holding conflict. Therefore, countries could experience distinct and 
various internal conflict processes and outcomes. 

Moreover, conflict scholars have not reached consensus with regard to whether 
violence against civilians emerges as irrational random attacks, as a consequence of 
the tension between ethnic groups, or as an instrument for powerful groups to reach 
their political and military goals (Valentino, 2014, see also Salvi et al. in the chapter 
"Violence Against Civilians" in this book). Although it is likely that a country's 
political characteristics can drive civil unrest, many conflict events in Sudan were 
driven by actors' local demands and grievances. Some of these conflicts diffuse 
or spread from their initial locations, while other conflicts do not spread out across 
regions within a country's territory (Raleigh et al., 2010). Thus, country-level factors 
fail to explain spatial patterns of conflict diffusion within a country, especially due 
to actor types such as whether they are state, non-state, or civilian actors. 

Fortunately, the recent advances in technological systems and methodologies 
enable conflict scholars to tackle actor-level characteristics as a mechanism by 
using conflict data at the event (point) level, rather than relying on aggregated 
units, which can better capture local dynamics and spatial variations in conflict. 
In this paper, we make use of disaggregated spatial data, together with continuous 
space models, through which we shift from the conventional monodic, state-based 
approach to an actor-centered approach to reexamine different types of conflict and 
test conflict diffusion on the local level. This study also deviates from the traditional 
approach of analyzing the effect of actor-level characteristics on conflict events at 
the country level, such as actors' ethnicity, and their capabilities of appealing to 
violence (Cederman and Gleditsch, 2009; Buhaug and Gleditsch, 2008; Harbom 
et al., 2008; Toft, 2005), by incorporating the spatial and temporal process of conflict 
as well as actor-level characteristics at the conflict point level. This analytical 
strategy fills a gap in the literature, in that it helps to answer what causes some 
differences in conflict diffusion patterns. In some cases, small conflicts scale up to a 
violent conflict event, while in other cases they do not and these conflict events may 
or may not further spread around areas within a country (Hoeffler, 2012; Buhaug 
et al., 2011). So far, only few empirical works analyze conflict in continuous space 
(Zammit-Mangion et al., 2012). Our study builds on the literature of conflict point 
processes over continuous space through detailed analysis of duration length, actor 
types, conflict types, and temporal elements. 

In this study, we analyze five types of conflict events: "Battle—Government 
regains territory,’ “Battle—No change of territory,” ^Battle—Non-state actor over- 
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takes territory,’ “Riots/protests,’ and “Violence against civilians.” These types 
of conflict are closely related to the political instability and contentious state 
accompanied by the history of South Sudan being an independent country since 
2011. In some cases, civil conflicts may be smaller episodes of a larger civil war, 
while others are not. By analyzing these event types in detail, this study captures 
Scenarios where actors have a relatively equal military power (e.g., the first three 
categories of conflict) versus actors have an unequal capacity to appeal to violence 
(e.g., the last two categories), and differences in the spatial diffusion mechanism of 
conflict events by these event types. 

By incorporating different types of conflict events and actors involved in conflict, 
and modeling the conflict process of these events, this study is able to capture 
differences in patterns of conflict diffusion in South Sudan. This analytical strategy 
enables us to assess a range of conflict dynamics because large-scale violence 
encompasses episodes of random killing, violence against civilians as well as the use 
of selective or indiscriminate violence (Koc-Menard, 2006). These different types 
of conflict reveal various purposes and/or goals pursued by the actors engaged in 
conflicts, such as advancing military interests, social identity, or political loyalty 
(Schwartz and Straus, 2018; Valentino, 2014; Balcells, 2011). We believe it is 
theoretically meaningful to reexamine different types of conflict based on the 
actor-level characteristics because actors involved in these events vary with their 
motivation, strategic behaviors, and capabilities of appealing to violence, which 
could further lead to distinct conflict processes and patterns. Distinguishing between 
state, non-state, and civilians as actors is theoretically meaningful because these 
actors directly determine where and when conflicts might take place as well as the 
condition under which their motivations, capabilities, and strategic responses might 
(or might not) lead to full-blown civil war (Cunningham et al., 2013). 

Lastly, we study the difference in diffusion mechanisms by the duration of the 
conflict event. In many ecological and epidemiological models as well as models 
on crime, the data that is collected is simply an event or observation at a certain 
point and time (Wang et al., 2016; Liang et al., 2008; Best et al., 2000; Sparks, 
2011). However, this may not be true of conflict data, where an event can have a 
wide range in its duration. Some events merely last for 1 day, while other events 
persist for several days or weeks at certain time periods. Therefore, we differentiate 
between conflict events that last only 1 day with conflict events that last longer than 
1 day in our study. 

The paper is organized as follows. In the next section, we summarize relevant 
research in three categories: empirical studies on the spread of conflict and wars, 
methodological studies on models for grid data, and continuous space models. 
Section 3 summarizes the data we use in this study, and in Sect. 4 we describe the 
methods and present the results, comparing conflict events based on time, conflict 
type, and actors involved. We conclude with a summary and some caveats in Sect. 5. 
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2 Related Work 


2.1 Empirical Studies on the Diffusion of Conflict 


When it comes to spatial modeling, scholars have often either focused on spatial 
dependence alone or spatial heterogeneity by using distances between conflict 
events or borders between cities and states (Anselin and O'Loughlin, 1990). 
Typically, empirical works on modeling space or location for conflict events have 
relied on modeling the covariance structure directly, or non-constant error variances 
in a regression model (Anselin and Baltagi, 2001; Starr, 2003; Ward and Gleditsch, 
2002). Thus far, there is no consensus yet with regard to what methods can better 
identify empirical patterns of conflict diffusion in a robust way, and there are no 
concrete theoretical linkages pointing out how spatial effects might influence the 
diffusion of conflict. For example, there could be demonstration effects or diffusion 
effects. The former refer to the effect derived from either organized or spontaneous 
collective action. The latter often rely on a larger scale and latent learning process, 
through which rebel groups learn by observing others. There could also be contagion 
effects, as rebels move over borders to incite rebellion by their co-ethnics in 
neighboring countries. Even though methods for robustness tests are available (Bera 
et al., 2019; Penghui et al., 2015), they are ad hoc. Therefore, it is unclear whether 
differences in empirical results are due to estimation methods, measurement, data 
sources, and/or spatio-temporal coverage (Hegre and Sambanis, 2006). 

Schutte and Weidmann (2011) model both relocation and escalation diffusion 
through a joint count statistic. However, they use gridded units for their analysis, 
and therefore rely on aggregation to areal units. Additionally, although Loeffler 
and Flaxman (2017) present a strong analysis of diffusion of crime over continuous 
space, they do not differentiate between these two kinds of diffusion presented by 
Schutte and Weidmann (2011) or other characteristics of events that are critical to 
understanding conflict. Therefore we are interested in how the log-Gaussian Cox 
Process (LGCP) model framework may be parameterized to model diffusion of 
conflict data, with various characteristics of events that are crucial to understanding 
the diffusion of conflict events. 


2.2 Grid Models 


We continue our review of related work by studying the use of grid models to 
motivate why continuous space models are necessary and feasible to the study of 
conflict events. In past research, grid squares were often used as the unit of analysis 
because it allows researchers to specify the scale of the analysis based either on a 
theoretical framework of expected conflict distributions or on available sub-national 
data. Gridded squares have been increasingly used in empirical conflict studies due, 
in part, to the fact that aggregate-level administrative or other types of areal units, 
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such as counties/regions or Census tracts, can mask conflict dynamics (Fjelde and 
Hultman, 2014; Dittrich Hallberg, 2012; Wood and Sullivan, 2015). Furthermore, 
the use of sub-state conflict event data offers the opportunity to explore the local 
distribution of violence over time. 

For example, the PRIOGRID data set is an aggregate format of global-scale 
disaggregate (geo-coded point level) armed conflict events (Tollefsen et al., 2016). 
PRIOGRID counts of conflict events are the sums of point level conflict events 
that happened within a given spatial grid cell. This data set has contributed to the 
emergence of many new studies (Von Uexkull et al., 2016; Theisen et al., 2012; 
O'Loughlin et al., 2012; Hendrix and Salehyan, 2012). Nonetheless, when using 
gridded data, the Modifiable Areal Unit Problem (MAUP) represents a significant 
challenge where the level of aggregation can influence the results of the statistical 
analysis. The MAUP can significantly weaken and bias a statistical result when 
smaller areal units are aggregated to form larger areal units. Under this scenario, 
rezoning or moving boundaries of an area can have significant effects on the results 
(Wong, 2009). 

There is little to no agreement across studies in terms of existing empirical 
findings using grid cells as the units of analysis, nor can this modeling strategy 
be used to capture the complex spatial dynamics and diffusion of conflict processes 
because it is limited by its aggregated nature to a certain resolution of analysis and 
accuracy. Therefore, we turn our attention to continuous space models where we do 
not rely on aggregation to areal units, such as grid cells. 


2.3 Continuous Space Models 


There are few empirical works in conflict studies that analyze conflict events 
over continuous space. Zammit-Mangion et al. (2012) provides one of the first 
attempts to model conflict, instead of crime or disease diffusion, over continuous 
space. However, in their model they do not specifically discuss the parameterization 
of diffusion, although they suggest it is possible to model diffusion using their 
model. Instead, they measure volatility and heterogeneous growth and give explicit 
parameterizations for each. They use a continuous space and discrete time version 
of the log-Gaussian Cox Process (LGCP) model, as they recognize that conflict 
data is reported on the day level, instead of at the exact time. They use the 
Stochastic Integro-Differential Equation (SIDE) approach to estimate the effects of 
time. The main focus of their study is on prediction, and they did not focus on the 
interpretations of the parameters. 

There is some existing work on measuring diffusion of other types of events in 
continuous space, where the authors use the Hawkes process to model crime for 
example (Loeffler and Flaxman, 2017). However, in Loeffler and Flaxman (2017), 
they study gun crimes, which is a classical example of a point process in which the 
event happens in an exact instance in time. As stated above, in conflict data the event 
happens over the span of a day, or sometimes several days or weeks. Therefore, in 
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our approach, we also analyze the difference in diffusion dynamics between events 
that last longer than 1 day as compared to events that only last 1 day. 

We characterize conflict diffusion mechanisms in continuous space across South 
Sudan for the five types of conflict events discussed above, based on the duration 
of conflict events and military power of actors involved in conflicts. Thus far, 
researchers studying diffusion processes of conflict and some other social phenom- 
ena, such as crime, often use areal unit models, where events and other kinds of data 
are aggregated to usually administrative areas or grid cells by either the data provider 
or the researcher. Therefore, even when point or event level data is available, these 
processes are not frequently studied in continuous space. Though there have been 
some studies of conflict events in continuous space (Zammit-Mangion et al., 2012), 
none has specifically parameterized diffusion in conflict event data in continuous 
space. Therefore, we tackle the challenge of modeling spatial dependence of conflict 
events in continuous space. 


3 Data 


Event or point level data is characterized by an exact location, or coordinates, 
affiliated with the event and the events sometimes also have date/time affiliations. 
We refer to these events as points as these will be the “points” in our “point process” 
statistical methodology and they occur at exact points in space. Using data from 
the Armed Conflict Location and Event Data (ACLED) (Raleigh et al., 2010), this 
study examines conflict processes of five event types, including three types of battles 
(in which armed forces are often involved with different outcomes of territorial 
integrity), riots/protests (public demonstration and violent protest), and violence 
against civilians (physical harm to civilians, which can also be committed by rioters 
and/or to protesters) in South Sudan from 2011 to 2018. ACLED collects this data 
through news from international, regional, and local media reports as well as NGO 
accounts (ACLED, 2019). 

Our full dataset consists of 4654 conflict events in South Sudan from July 14, 
2011 through December 10, 2018. We see in Fig. 1 that there are many more conflict 
events that occur after 2014. Figure 1 also illustrates the event type over time, where 
most of these events are battles where there is no change in territory and violence 
against civilians. The counts of each of these event types for the full time period are 
presented in the legend of Fig. 1. 

In Fig. 2a, we see the spatial distribution of these 4654 events over the country 
of South Sudan, where most of the events occur in the middle of the country. Each 
event has an affiliated transparency, so where there are darker colors in Fig. 2a and b, 
that means there are more events occurring at that location. 

In the ACLED dataset, each day of an event is given a separate row in the dataset. 
If we count each of the days of a given conflict as separate events, then we have a 
total of 4654 conflict events as stated earlier. However, if we treat events as the same 
if they share the same location and description and occur within similar time periods, 
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Fig.1 Frequency of conflict events in South Sudan over time, by conflict event type. The 
histogram bars are stacked on top of each other by event type 
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Fig. 2 Spatial distribution of conflict events over South Sudan. In Fig. 2a, all points, which 
represent conflict events, are equal size and the darker the point, the more events have occurred at 
that location. In Fig. 2b, the size of the point represents the duration of the event, and the shading 
again represents more events at that location. (a) Location only. (b) With duration 


and then assign the duration as the number of days that the event persists, then we 
only have 4160 unique events in our dataset. When duration is defined in this way, 
the duration of an event ranges from | day to 24 days. Most events last 1 day with 
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Table 1 We show common actors by frequency of conflict event involvement in South Sudan. 
We include the actor type that we assigned to each of these actors, in order to provide examples of 
actors in each category 


Actor name Frequency | Actor type assigned 
Military Forces of South Sudan (2011-) 2401 State 
Civilians (South Sudan) 1719 Civilian 
Sudanese Peoples Liberation Army/movement-in opposition | 1536 Non-state 
Unidentified Armed Group (South Sudan) 989 Non-state 
Police Forces of South Sudan (2011-) 190 State 
Protesters (South Sudan) 177 Civilian 
Murle Ethnic Militia (South Sudan) 161 Non-state 
Unidentified Communal Militia (South Sudan) 156 Non-state 
Mutiny of Military Forces of South Sudan (2011-) 144 State 
Rioters (South Sudan) 99 Civilian 


the mean duration being 1.11 days. In Fig.2b, we see the duration of these events 
illustrated by the size of the point. Figure 2b gives a sense of the spatial distribution 
of the events that last longer than 1 day. For the purposes of our analysis, due to the 
fact that the vast majority (9396) of conflict events only last 1 day and we would like 
to treat these events to be independent, we take the first day of the given conflict 
event and assign each event their duration as a covariate. Therefore, our final dataset 
for modeling purposes consists of 4160 unique events all of which are assigned their 
duration as a characteristic of the event, which we will analyze in detail later. 

Through the ACLED dataset, we have two actors that are included with most 
events. There are no events that are missing the first actor and there are only 192 out 
of the 4160 events in our final dataset that are missing a second actor, i.e., where 
there was only one actor involved. These are exclusively events where there are only 
rioters or protesters involved in the event. There may be events with more than two 
actors involved, but ACLED only provides the first two actors involved. From these 
4160 events, we have 285 unique actors in our dataset. We see some of the most 
commonly occurring actors below in Table 1, along with their assigned actor type. 

As mentioned in Sect. 1, it is important to consider the difference in diffusion 
mechanisms between events that include state and non-state actors as well as 
civilians/rioters/protesters. To address this difference, we re-code and divide all of 
the ACLED actors into state actors, non-state actors, and civilians/rioters/protesters. 
We abbreviate the last category as simply "civilians" going forward. State actors 
include governments, governmental agencies (e.g., security department or police 
department), and the military force formed by a government or ruling party. 
Non-state actors contain a wide range of rebel groups, including identity militias 
and political opposition organizations. The last group includes civilians, rioters, 
protesters, and prisoners. The main difference between non-state actors and this cat- 
egory of civilians/rioters/protesters/prisoners is that non-state actors are organized 
actors with military power and the civilian groups have no military power. 
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Table 2 We show the counts of conflict events by the actor dyads that were involved for South 
Sudan from 2011 to 2018. We also provide an example of an actor dyad for each dyad type. For 
the case of only civilian involvement, these events often did not include a second actor 


Actors involved Frequency of event | Example 

State and non-state 1556 Yau Yau Rebels, Government of South Sudan 

Non-state and civilian | 1161 Dinka Ethnic Militia, civilians 

State and civilian 579 Military Forces of South Sudan, civilians 

Only non-state 415 Kuei Ethnic Militia, Rup Dinka Ethnic Militia 

Only state 230 Military Forces of South Sudan, Police Forces of 
South Sudan 

Only civilian 219 Protesters (one actor) 


After this categorization, we then assess the difference in diffusion of conflict 
events that include different combinations of these kinds of actors, including only 
state actors, only non-state actors, or only civilians as well as the dyads of these 
actor types. In Table 2, we show the number of conflict events that fall under each 
actor dyad. We see that most events involve either non-state and state actors or non- 
state actors and civilians. We note that at times, as mentioned earlier, there is only 
one actor involved, which would necessitate this being categorized as only state, 
non-state, or civilians. However, this only occurs in 4.696 of the events. An example 
for each kind of event is included below in Table 2. 


4 Analysis 


4.1 Test for Complete Spatial Randomness 


To begin our analysis, first we test for complete spatial randomness, or if our point 
process of conflict events follows a homogeneous Poisson Process. We perform this 
test because if the data follows complete spatial randomness, then there is no need 
for more complicated statistical testing. We define a homogeneous Poisson Process 
as follows: 


Definition 4.1 Homogeneous Poisson Process: Let N(A) denote the number of 
events in a region A, and |A| the area of this region, then the data/events 
X1, X52, ..., Xn a) follow a homogeneous Poisson Process if the following con- 
ditions are fulfilled: 


1. For some à > 0 and any finite region A, N(A) ^ Poisson(A|A|) 

2. Given N(A) = n, the n events form an independently and identically distributed 
(iid) sample from the uniform distribution on A. 

3. For any two disjoint regions A and B, the random variables N (A) and N(B) are 
independent. 
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If our data follows a homogeneous Poisson Process, then it also follows complete 
spatial randomness (CSR). The parameter À is the rate or the intensity of the point 
process. Before we proceed to build more complicated models, we will test the 
null hypothesis of complete spatial randomness. We diagnose CSR through Monte 
Carlo sampling and empirical cumulative distribution functions (ECDF's), which 
we define below. 


Definition 4.2 Empirical cumulative distribution function (ECDF): If X1,..., Xn 
are iid with CDF F, then the ECDF is F(x) = 22110559. = *UG5x]. This is an 
unbiased estimator of F(X) = P(X; < x). 


For our use of the ECDF, in this approach, we use Monte Carlo sampling 
to construct simulation envelopes under CSR. We sample x1, x2, ... x4 locations 
uniformly on A and construct F as defined above. We use empty space distances 
where d(u) = min||u — x;||,i 4 j. We then construct our estimate F, (r) of 

L 


F,(r) = P(d(u) < r) = P(at least one point within radius r of u). We then 
compare Ê, (r) to F, (r) under CSR. 

We conduct this test for CSR and plot the estimates for the F function, the 
simulation estimate, and the observed F function below in Fig. 3. In these tests, if 
we see that the observed line is very close to the mean simulated line, then we have 
evidence for CSR. If the observed line lies outside of the simulation envelope, or the 
dark shaded area, then we have evidence against CSR. For the pointwise case, the 
simulation envelopes, or the shaded areas, are constructed by sorting the simulated 
values and taking the mth lowest and mth highest values (Baddeley et al., 2015). 
For the simultaneous case, the simulation envelope is slightly more complicated. 
For each simulation, we calculate the theoretical mean value of F under CSR 
and we calculate the maximum absolute difference between the theoretical curve 
and the simulated curve. After the simulations, we take the mth largest absolute 
deviation from all the simulations, devm, and this forms our simulation envelope 
through lo = Fiheo — devm and hi = Fiheo + devm where Fijeo is the theoretical 
mean value (Baddeley et al., 2015). We use two different techniques to estimate 
the F function: simultaneous and pointwise. We use both the simultaneous and 
pointwise method because through the use of the simultaneous envelopes, we get 
a smoother function but a much more conservative estimate of CSR as compared 
to the pointwise envelopes. In both tests, we have 800 simulations and choose m to 
be 20. 

In Fig.3, we see the estimates of the F function for both the pointwise and 
simultaneous methods. We see that even for the more conservative method, the 
simultaneous envelope, we see strong evidence against complete spatial randomness 
(CSR), as the observed curve is not close to the mean simulated curve under CSR. 
Therefore, we conclude that spatial modeling is appropriate for our dataset. 
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Fig. 3 We diagnose the absence of complete spatial randomness (CSR) through the simultaneous 
and pointwise simulation envelopes, as the observed curve lies outside of the envelope of the 
simulated curve 


4.0 Continuous Space Model 


Now that we have established that spatial modeling is appropriate, as the data does 
not follow complete spatial randomness, we start with a simple continuous space 
model. First, we treat our dataset as an inhomogeneous Poisson Process with an 
intensity A(x). We create a kernel density estimate for the intensity A(x) over space 
and time (Rowlingson and Diggle, 1993). First, we plot the estimate of the spatial 
intensity over the complete time window as well as the estimate for the temporal 
trend over the full area through a quartic kernel. In Fig. 4a, we see a high estimate 
for the kernel density estimate in the southern part of the country, as well as a couple 
of other peaks. In regards to the time dimension in Fig. 4b, the low count of conflict 
events at the end of the time window is due to lack of data in the last month of the 
dataset, as the dataset only covers through mid-December. We also see a low count 
for the number of events at the beginning of the time window, due to the actual low 
frequency of events from 2011—2014, as described earlier. The difference between 
these two low counts at the beginning compared to the end is due to observed low 
frequencies vs lack of data availability in the time window, respectively. We see a 
decrease and then increase in conflict in the middle of the dataset. 

Next, we plot the estimate of the intensity function over 12 month intervals for 
our 8 years of data. We note that although time is continuous, we discretize time 
into years in order to visualize the change in the kernel density estimate over time. 
In Fig. 5, we see that our spatial kernel density estimate changes over the year-long 
periods. In Fig. 5a, all of the kernel density estimates use the same scale so we are 
able to see that there is really only one time period, in 2016, where the conflict events 
are quite concentrated in space at high levels in that concentration. Otherwise, they 
are pretty evenly distributed throughout the country at lower levels than 2016. In 
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Fig. 4 Through a kernel density estimate for the spatial intensity function, A(x), and the temporal 
intensity function, we see an estimate of the spatial distribution and temporal trend of conflict 
events from 2011-2018. (a) Spatial intensity function estimate. (b) Temporal intensity function 
estimate 


2011 2012 2013 2014 2015 2016 2017 2018 


WSN. 


0200 600 


2011 2012 2013 2014 2015 2016 2017 2018 


O O 
Imm m = pry me m FERMI m 
020 60 0 20 40 © 102030 0 50 100 150 0 40 80 120 0200 600 050 150 050 150 


(b) 


Fig. 5 Through the kernel density estimate of the intensity function A(x) over space and time, we 
see how the spatial distribution of conflict events changes over each year in our dataset. Through 
the common scale, we see there is only 1 year, 2016, with a strong spatial peak relative to the other 
time intervals. However, if we use a scale determined by each year, we can see where the conflict 
events are spatially concentrated by year. (a) Common scale. (b) Scale by year 


Fig. 5b, we have a different color scale for each year of the data, so you can see 
the relative peaks within the given year of data and where the spatial peaks are 
shifting. Specifically, we see that there is a region in the southern part of South 
Sudan that often has fairly high conflict rates over the full 8 year conflict window. 
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However, there are other areas in the country, such as in the north east, which have 
high conflict events for some periods but not others. 


4.3 Gaussian Process 


Now that we have seen a preliminary estimate of the intensity function, we will 
model our continuous space data as a Gaussian Process in order to study diffusion 
mechanisms in the conflict process. We do this through fitting a log-Gaussian 
Cox Process (LGCP) through the spatstat package in R (Baddeley et al., 2015). 
In an LGCP model, the intensity function is defined as A(x) = exp(Z(x)) at 
location x where Z(x) is a Gaussian random field in the two-dimensional plane 
(Moller and Waagepetersen, 2003). The intensity of the LGCP is then governed 
by the Gaussian Process, Z(x), which has covariance function C(r) where r is the 
distance between two points. We use an exponential covariance function so that the 
covariance function takes the following form: 


C(r) 2 o?e 7! 


Through the spatstat package in R, we estimate o? and o for our data. We plot 
the estimate of the covariance function to illustrate the spatial range of our data. 
Once the covariance function falls below a certain point, which we will specify as 
0.1, we call this distance the effective range of our events. The reason it is called 
range is because this is the distance at which events still impact each other with 
nonzero covariance. However, we will never actually observe a covariance of zero, 
so the reason this is called the effective range is due to the fact that the range is 
small enough, it is effectively 0. A larger effective range suggests that events have 
a stronger influence or dependence on events for a larger radius around those given 
events. There is no inherent rule with the choice of the cutoff, as this is a heuristic 
process. However, the covariance function is most accurate when the radius is small. 
Therefore, when we specify a smaller cutoff, we are effectively choosing a larger 
radius where the covariance function is not as accurate. In the figures in this section, 
the dashed line represents the cutoff point of 0.1. 

We note that the number of events in each category does not necessarily affect 
the value we find for the effective range. In other words, a smaller number of events 
in one category does not necessarily lead to a higher or lower estimate of our 
effective range. However, a smaller number of events in the category could affect 
the precision of our estimate, which we will address in Sect. 5. 

First, we compare the effective spatial range for each year of the conflict data in 
order to make conclusions about the change in diffusion mechanisms over time. In 
Fig. 6, we plot the effective range through the covariance function estimate for each 
year. We see that the estimate of the covariance function does change over time in 
our dataset, but not drastically. We include the effective range over time in Table 3. 
We see that the effective range decreases from 2011 to 2014 and then increases from 
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Fig. 6 When we plot the estimate of the covariance function by year, we see lower estimates of 
the effective range for 2014, 2015, and 2017 


Table 3 Effective range 


: Year | Effective range l 
estimates by year l 
2011 |0.42 
2012 | 0,32 
2013 | 0.30 
2014 | 0.22 
2015 | 0.28 
2016 | 0.32 
2017 | 0.22 
2018 | 0.34 


2014-2016 but this increasing trend does not hold through 2018. This implies that 
between 2011 and 2014, events started to exhibit a weaker dependence structure, 
where events had a smaller effect on surrounding events than previously. This makes 
intuitive sense as conflict events were relatively infrequent in the region throughout 
this time period. Between 2014—2015, the effective range increased, meaning that 
each event had a stronger effect for more distance, or a larger radius, r, around 
that given event. This also makes intuitive sense as there were large increases in 
conflict events during this time period. As stated earlier, there is not necessarily 
a relationship between more events in a given conflict event group and a higher 
or lower effective range. However, more events occurring in a certain period of 
time may suggest that these events would spur more events following them and 
have a stronger dependence mechanism, and would therefore have a larger effective 
range. The smaller effective range may also be due to the fact that there are multiple 
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Fig. 7 When we plot the estimate of the covariance function by actor type, we see higher estimates 
for the effective range for the state actor and civilian dyad and a small effective range when there 
are non-state actors and civilians involved 


Table 4 Effective range Actors involved Effective range 
estimates by actors involved a 

Only state 0.28 

Only non-state 0.30 

Only civilian 0.29 

State and non-state 0.25 

State and civilian 0.37 

Non-state and civilian | 0.17 


clusters of conflict events during these years (2013, 2014, 2015, and 2017), as seen 
in Fig. 5b. 

Next, we compare the effective range of conflict events based on the types of 
actors that are involved. We see in Fig.7 and Table 4 that the effective range is 
highest for events where state actors and civilians are involved. The effective range 
is quite similar for other conflict events, especially as the radius gets higher. This 
suggests that the diffusion mechanism might be strongest when there are state actors 
and civilians involved in the conflict but other actors do not have a large effect on 
diffusion. This would illustrate that when state actors and civilians are involved, this 
may spur additional events around it more-so than other conflict events, perhaps due 
to the contentious nature of these kinds of events. 
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Fig. 8 When we plot the estimate of the covariance function by conflict type, we see a larger 
effective range for battles where non-state actors overtake territory or when government regains 
territory 


Table 5 Effective range 


. j Conflict type-specific | Effective range 
ea ONEK Violence against civilians | 0.195 
Riots/protests | 0.28 
‘Battle-government regains territory | 0.475 
‘Battle-no change of territory | 0.185 


Battle-non-state actor overtakes territory | 0.635 


We also conduct this analysis based on conflict type. We see in Fig. 8 and Table 5 
that the event types with the strongest diffusion mechanisms are battles where 
the government regains territory and battles where non-state actors overtake the 
territory. It is notable that battles with no change in territory have a relatively weak 
diffusion mechanism when compared to these other battle types, suggesting that 
it is necessary to differentiate between these kinds of conflict events. We thus see 
that these two types of battles, where there is change in territory, may impact the 
surrounding more than battles where there is no change in territory or violence 
against civilians. Riots and protests lie in the middle of these conflict event types. 

Lastly, we also analyze these events by duration length. We show the difference 
in the diffusion mechanism for events that last only 1 day compared to events that 
last longer than 1 day. In Fig. 9 and Table 6, we see that events that last more than 1 
day exhibit a much stronger diffusion mechanism through the larger effective range. 
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Fig. 9 When we plot the estimate of the covariance function by duration length, we see a larger 
effective range when the event lasts for longer than 1 day 


Table 6 Effective tange Duration length | Effective range 
estimates by duration length 
One day 0.14 


More than | day | 0.335 


This makes intuitive sense as these longer events are likely to be larger-scale events 
in severity, and therefore might spur other events. 

Through this analysis, we show that it is important to analyze diffusion mecha- 
nisms, such as the effective range of any given event, over time. We have shown 
that for our dataset, the dependence between events changes over time, and the 
effective range for conflicts in South Sudan decreased from 2011-2014, suggesting a 
weaker diffusion or dependence mechanism, but then increased from 2014—2016, as 
conflict escalated in the region. We have also shown that it is important to take into 
consideration event types, actor types, and duration of the event when considering 
diffusion mechanisms between events. 


5 Discussion and Future Work 


Our results illustrate that it is valuable to characterize the dependence between 
events when we have rich data in continuous space. We began by characterizing 
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the dependence over time and space through density estimates and formally 
estimated this dependence through a log-Gaussian Cox Process (LGCP) model 
and the covariance function. We found that it is important to estimate changes in 
the diffusion mechanism over time and across actors and conflict types to detect 
differences in the diffusion process. 

The theoretical implication of our empirical analysis is two-fold. First, battles 
with territorial gains for one side tend to diffuse over larger distances than battles 
with no territorial change. This implies that the location of an individual conflict 
event and the clustering of multiple events or locations of events could have 
significant effects on the subsequent onset or termination of other conflict events. 
Based on our analysis, the southern part of South Sudan, which is the area around 
the capital, Juba, has a higher estimated intensity than other surrounding regions. 
A higher estimated conflict intensity around the southern part of South Sudan is 
evidence of the spatial interdependence among conflict events. This result reflects 
a historical conflict diffusion process of mass killings of the Nuer people by Dinka 
paramilitary groups in the capital, Juba, which were the pretext to the origin of the 
outbreak of civil war (Sawe, 2017). In the future, we would also like to control 
for certain demographic and socioeconomic characteristics over space and time, 
including population, housing, and transportation infrastructure. 

Second, modeling spatio-temporal distribution of conflict events will contribute 
to defining the conjuncture of initial conditions of locations where conflict took 
place and capturing plausible mechanisms underlying changes on the spatial 
distribution or pattern of conflict over times. Our analysis shows that conflicts 
with longer duration exhibit stronger spatial dependence. Throughout the process 
of being an independent country, conflict events in South Sudan are most spatially 
concentrated in 2016, when compared to the other years in our dataset. This finding 
reveals that the long-term unsolved ethnic tension between the government and 
rebel groups led to the outbreak of large-scale violence against civilians in 2016. 
Both the government and the rebel groups committed abuses against civilians in and 
around Juba and other areas. According to the Global Conflict Tracker and Human 
Rights Watch, the government and the rebel group targeted civilians along ethnic 
lines, and millions of people were displaced or forced to flee their homes. Military 
groups committed rape and sexual violence, destroyed property and looted villages, 
and recruited children into their ranks. The UN even warned that this ongoing 
ethnic war was likely to transform into genocide (GlobalConflictTracker, 2019; 
HumanRightsWatch, 2017). Therefore, in future research, it is crucial to investigate 
how ethnic identity might service as an underlying mechanism reshaping the spatial 
patterns of conflict. 

Third, we have shown that it is important to consider several key characteristics 
of conflict events when analyzing the dependence structure of these events. For 
example, we show that temporal analysis, rather than simply a cross-section, may 
be important in these analyses, which agrees with existing literature that also studies 
temporal dynamics (Read and Mac Ginty, 2017; Silwal, 2013). This temporal 
analysis should be expanded in future work. We also illustrate that the types of 
actors that are involved with conflicts can play a crucial role in determining if that 
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event is likely to influence other events. Specifically, we find that when state actors 
and civilians/rioters/protesters are involved, this is likely to have a stronger effect 
on surrounding events, which bridges a gap in the literature (Cunningham et al., 
2013). We also find that it is important to consider event types, as battles where 
there are changes in territories have stronger diffusion mechanisms than other types 
of conflict events. This adds to the literature, as most studies do not investigate 
diffusion mechanisms between the types (Wood, 2010; Azam and Hoeffler, 2002; 
Balcells, 2010). Lastly, we find that events that last only 1 day vs longer than 1 
day also have quite different dependence structures, as events that last longer than 1 
day have much stronger dependence on events surrounding them. To the best of our 
knowledge, this study is the first that empirically examines duration using the day- 
span as the unit of analysis and carefully examines whether the events that persist 
only 1 day are important. Our results were gathered through the use of continuous 
space models and point level data, and therefore represent an advancement on most 
existing methods. 

There are some caveats that are necessary in this analysis. Most importantly, in 
the estimate of our covariance function, there are more points at a smaller radius 
around a point and less points once you get further away. Therefore, as we get 
to a larger radius, the covariance estimate is less reliable and we interpret the 
difference in the covariance function with some caution at larger distances. As 
mentioned earlier, if there are fewer events in a given category, the precision of 
the covariance function estimate, and therefore also the effective range estimate, 
may be affected. One way to assess the variability of these estimates is through a 
parametric bootstrap where we would simulate multiple realizations from a fitted 
model and assess the variability in the resulting estimates of the effective range. 
However, given the computationally intensive nature of these models, we interpret 
our results with caution, only draw conclusions where there are large differences in 
the effective range estimates, and suggest this for future work. 

In the future, we could consider different formal models for diffusion mecha- 
nisms, through the epidemiology literature. However, the log-Gaussian Cox Process 
methodology provides an easily interpretable mechanism to analyze diffusion in the 
context of conflict event data. We would also like to incorporate stronger temporal 
measures of dependence into our analysis, such as through continuous time models, 
as well as covariate information. An analysis by conflict severity, based on the 
number of causalities in addition to our analysis of severity by duration of the 
conflict, could also be an additional important step in this analysis. 
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Rebel Group Protection Rackets: A) 
Simulating the Effects of Economic gag 
Support on Civil War Violence 


Frances Duffy, Kamil C. Klosek, Luis G. Nardin (5, and Gerd Wagner 


Abstract Rebel groups engage in a series of economic transactions with their local 
populations during a civil war. These interactions resemble those of a protection 
racket, in which aspiring governing groups extort the local economic actors to fund 
their fighting activities and control the territory. Seeking security in this unstable 
political environment, these economic actors may decide to flee or to pay the rebels 
in order to ensure their own protection, impacting the outcomes of the civil war. We 
present a simulation model (executable at https://gnardin.github.io/RebelGroups) 
that attempts to capture the decision-making and behavior of the involved actors 
during protection racket interactions as well as the cooperation and competition 
between rebel groups to control territory. Our model reveals insights about the 
mechanisms that are helpful for understanding violence outcomes in civil wars, and 
the conditions that may lead rebel groups to prevail. Analysis of various scenarios 
demonstrates the impact that different security factors play on civil war dynamics. 
Using Somalia as a case study, we also assess the importance of the rebel groups' 
economic bases of support in a real-world setting. 
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1 Introduction 


How do rebel groups control territory and engage with the local economy during 
civil war? Charles Tilly's seminal War and State Making as Organized Crime (Tilly, 
1985) posits that the process of waging war and providing governance resembles 
that of a protection racket, in which aspiring governing groups extort local popula- 
tions to gain power, and civilians or businesses pay to ensure their own protection. 
We present an agent-based simulation model that attempts to capture the decision- 
making and behavior of the actors involved: rebel groups and enterprises. The 
model is available online at https://gnardin.github.io/RebelGroups as a web-based 
simulation and can be run in any modern web browser. 

The use of agent-based modeling in civil war research is not a new endeavor. 
The groundwork is laid by Epstein (2002), who analyzes the conditions under which 
individuals may mobilize and protest. He examines factors such as the legitimacy 
of a political system, risk-aversion of potential protesters, police strength, and 
geographic reach. Several researchers have since expanded upon Epstein's work 
by focusing on greed and grievances in rebellious movements (Goh et al., 2006), 
crime (Fonoberova et al., 2018), and network effects caused by the spread of 
media (Lemos et al., 2016). Our model also builds upon earlier research by focusing 
on civil war dynamics that could emerge after the initial rebellion. It focuses on 
the relationships between economic and security factors as determinants of inter- 
rebel warfare by treating collective rebel groups as well as business enterprises as 
individual agents. 

Conceptually, the opportunity for rebel groups to engage in extortion of local 
populations most often arises either in a condition of relative anarchy or when the 
incumbent government is too weak to protect local enterprises itself (Fjelde and 
Nilsson, 2012). Some scholars theorize that a security dilemma influences actors to 
support or form rebel groups for protection in order to balance against threats of 
violence and competition from one another (Posen, 1993). While this theory may 
capture some of the motivation for enterprises to seek security, it does not explain 
how this process takes place and why rebel groups, who may face varying degrees of 
security threats to their own interests, would be motivated to take and hold territory 
while controlling local populations. 

Several scholars address the question of when rebel groups choose to develop 
a relationship with civilians and when they choose to attack or loot them instead. 
Weinstein (2006) argues that the presence of outside support and lootable resources 
increases the likelihood that rebel groups will fail to seek a cooperative relationship 
with civilian populations under their control, because they can expand economically 
without extorting local businesses. By contrast, rebel groups that fight on behalf of 
ideological goals may be more likely to seek the cooperation of local communities, 
relying on them for provision or extortion of resources and providing protection 
or other benefits in order to maintain this relationship (Weinstein, 2006). Further, 
relatively weak rebel groups with rising battlefield costs, which are not strong 
enough to engage in extortion activities, may also loot civilians rather than engage in 


Simulating Rebel Group Protection Rackets in Civil War 227 


a process of periodic extraction from the local population (Wood, 2014). As losses 
increase, rebels are more likely to desert the rebel group, die in heavy losses, or be 
expelled as the group can no longer afford to compensate them (Weinstein, 2005). 

Therefore, the economic conditions in a conflict territory, as well as the funding 
characteristics of a rebel group, influence the extortion of local enterprises. These 
extortion activities have consequences for the behavior of rebel groups in conflict. 
When two different rebel groups contest territory or seek the extortion of enterprises 
under the control of another group, we can expect fighting between the rebels (Fjelde 
and Nilsson, 2012). The groups may seek greater economic gains, continued domi- 
nance over an area of territory, or simply to protect and defend the local populations 
that they themselves extort. These incentives may motivate competition and battles 
between groups, which depend on the available resources and capabilities. Thus, 
the extortion racket they develop permits a continued process of expansion aiming 
at hegemony and protection of the territory. 

Those who support a rebel group and benefit from its protection may provide 
information or intelligence on the group’s rivals, or even be recruited to join its 
ranks which further increases the group’s fighting capacity (Barter, 2012; Kalyvas, 
2006). Apart from providing temporary jobs as fighters, local populations can also 
economically benefit depending on the status quo ante. For instance, Leeson (2007) 
and Powell et al. (2008) find that the statelessness conditions of Somalia between 
1991 and 2005 were more favorable for local business actors than the previous 
predatory regime of Siad Barre. This holds mainly in economic sectors that are less 
dependent on government authority but suffer due to corruption and forced seizures 
of assets. Alternatively, if rebel groups are unable to provide protection following 
extortion or they themselves engage in violent attacks and looting, business actors 
may choose to support a rival group or even to flee. Whether due to direct attacks or 
inter-rebel group fighting, an increase in the severity of violence motivates business 
actors to flee for safety (Barter, 2012; Steele, 2009). 

Ultimately, a stable point may be reached at which rebel groups gain enough 
territory and sufficient trust from the local population to establish themselves as 
a legitimate governing institution. This shift may come with local adjudication 
responsibilities and public goods provision. Having established an ad hoc local 
governing institution, the system transitions away from the competitive security 
environment and the system is no longer in anarchy. At this point, one group 
achieves hegemony over all competitors. A stalemate may also be reached if two 
or more rebel groups balance against each other’s power, with neither group gaining 
the capability to seek the extortion of its opponent’s population (Walt, 1985; Waltz, 
1979). 

As civil war research continues to probe the political mechanisms that fuel local 
disputes and the origination of violence (e.g., Ostby (2008); Cunningham (2013); 
Cederman and Vogt (2017); Walter (2017)), our agent-based simulation model 
explores the economic relationships of rebel groups with their local populations. The 
model captures the extortion of local enterprises by rebel groups, their decisions to 
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expand their controlling territory, as well as the decisions of enterprises whether to 
report extortion or to flee. We use the model to perform security-related experiments 
using a theoretical system of three warring rebel groups, examining their impacts 
on the economy and the importance of their economic bases of support for their 
sustainability during a civil war. This analysis provides insights for understanding 
the causes and byproducts of rebel competition in present-day conflicts, such as 
the case of Somalia. We therefore also apply the model to historical scenarios 
experienced over the evolution of Somalia's civil war and derive some initial 
implications from our findings. 


2 Theoretical Underpinnings 


Intrastate armed conflicts or civil wars require rebel groups to raise revenue and 
gather resources in order to mobilize and to sustain the offensive and defensive 
potential necessary for long-term survival. Internally, a rebel group also needs 
to establish a structured organization to sustain group cohesion until the armed 
conflict ceases. Both mobilization and structuring incur distinguishable costs on 
rebel actors (Wennmann, 2009). According to Olson (1993), those rebel groups that 
are able to monopolize violence in a confined territory and establish themselves as 
the predominant local institutional structure are able to extract a permanent revenue 
stream through local taxation. This provides groups with crucial advantages as 
compared to “roving” actors who survive on incidental and temporary extraction 
gains. 

We assume the interactions of rebel groups in an anarchic environment. No 
superior actor can alleviate information asymmetries or commitment problems. 
In their pursuit of survival, rebel groups rely on their own capacities and are 
unrestricted in their choice of actions. This assumption is prevalent in the study 
of International Relations, in which states are the highest order actors. It is also 
popular in civil war studies, in particular in literature on bargaining failures (Spaniel 
and Bills, 2016; Nygard and Weintraub, 2014) and discussions of the ethnic security 
dilemma (Roe, 1999; Johnson, 2015). Similarly, Olson (1993) uses the assumption 
of anarchy as a backdrop of inter-group competition development. 

Academic research has also examined different types of rebel revenue streams. 
Armed actors during the Cold War were primarily financed by external patrons 
such as the USA, the Soviet Union, Cuba, China, France, and other countries 
posturing on the world stage (Schmidt, 2013). In the 1990s, in the wake of the fall 
of the Soviet Union and the ideological contention that accompanied it, the world 
established new norms of non-intervention, and sub-national armed actors needed 
to find replacements for their former patrons. Natural resource extraction featured 
most prominently (Lujala, 2008; Lujala et al., 2005; Collier et al., 2008), in addition 
to migrant and diaspora remittances (Regan and Frank, 2014; Escribà-Folch et al., 
2018) and looting (Wennmann, 201 1). 
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However, rebel groups may still rely on local extraction of funds and supplies. 
In order to achieve an enduring and consistent revenue stream, rebel groups create 
protection rackets against business actors operating within the area in which the 
armed conflict takes place. For example, in the Niger Delta region of Nigeria, 
local rebel groups kidnap corporate employees and release them in exchange for 
ransom (Ikelegbe, 2006). Revenues obtained through piracy also often end up in 
rebel pockets (Daxecker and Prins, 2017). Business elites in Somalia are required by 
local warlords to pay road taxes in addition to periodical payments in exchange for 
security (Ahmad, 2015). During civil wars in Sierra Leone and Liberia, rebel leaders 
instructed foot soldiers to forcibly extract revenues from local civilians, which they 
did through the manning of crucial local and cross-border trade checkpoints (Reno, 
1999). All actions occurred in rebel-held territory and allowed rebel groups to pay 
for food, arms, and shelter to their members. 


2.1 Rebel Group Extortion and Looting 


Rebel groups have two potential options for obtaining revenue from local enter- 
prises (Shearer, 2000). First, they may extort enterprises by periodically extracting 
a proportion of their revenue, which incrementally increases their wealth. The rebel 
group must take care that the extracted amount does not endanger the sustainability 
of the enterprise; otherwise, it will be left without a continuous revenue stream in the 
future. The alternative approach is looting, in which the rebel group seizes all of the 
wealth of an enterprise. This occurs primarily under three conditions. First, the rebel 
group behaves like a “roving bandit" and simply pillages the enterprise without any 
desire to control the territory where it is located (in the model, "territory" is meant 
not as a specific geographic location but in a metaphoric sense). Second, the rebel 
group punishes and loots enterprises that pay extortion money to an adversary rebel 
group. Third, a weak rebel group may find itself in fierce competition with a much 
stronger rebel group. In order to compensate for the strength of its opponent, it may 
attempt to collect more wealth over a short period of time through repeated looting. 


2.2 Enterprise Fleeing 


Enterprises have the inherent desire to create wealth and generate income, even 
under conditions of civil war. However, assuming no military or defensive capa- 
bilities of their own, and assuming no official government protection, they rely 
on protection provided by armed rebel groups. Cooperation with a group may 
be beneficial. If a rebel group limits its extortion amount to allow continued 
revenue earnings and growth, an enterprise can sustain itself and even overcome its 
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competitors who incur losses due to insecurity. However, if an enterprise is depleted 
of its wealth due to looting, it may decide to flee from the armed conflict rather than 
attempting to rebuild its business, since it might risk further unwanted extraction 
or threats to physical safety. For instance, Collier and Duponchel (2013) show that 
during 10 years of civil war in Sierra Leone from 1992 to 2001, businesses shrunk 
in size. 


2.3 Enterprise Reporting 


Enterprises may favor one particular rebel group as determined by the fraction of 
their extorted revenue. A favored group extracts the least amount of money as 
compared to its rivals. Assuming a purely economic system, we do not consider 
shared ethnic identity or other aspects of the group's reputation as influential 
towards an enterprise's preference. Reporting occurs when enterprises within the 
territory of one rebel group are extorted by a competing group. We base this logic on 
the experiences of enterprises who are extorted by mafia groups (see Nardin et al., 
2016). An Enterprise may choose to report the extortion attempt to its preferred 
(main) rebel group based on the amount extracted. 


2.4 Rebel Group Fighting and Expansion 


Fighting is the attempt of rebel groups to achieve hegemony during a civil war and to 
eliminate competing groups. We summarize the conditions for rebel fighting under 
three broad assumptions. First, two groups fight due to a competitive desire for 
power and expansion. Each group has an inherent wish to expand, and the more 
powerful a group, the more likely that it initiates fighting in order to conquer 
other groups. Fjelde and Nilsson (2012) have examined the proclivity of rebel 
groups engaged in armed conflict with one another, showing that larger capability 
discrepancies result in an increased probability of fighting. Second, rebel groups 
may wish to extort funds and resources from enterprises already controlled by 
a rival group. This broadens the economic base of a rebel group and allows it 
to recruit more soldiers, which in turn renders the group more competitive. The 
third influencing condition is enterprise reporting. Once extorted, an enterprise may 
choose to report to its main rebel group, increasing the probability of fighting by 
revealing strategic information about the group's whereabouts. A group successfully 
expands when enterprises are reallocated from the losing to the winning rebel 


group. 


Simulating Rebel Group Protection Rackets in Civil War 231 
2.5 Rebel Group Cooperation 


Under conditions of anarchy, rebel groups may counterbalance against one another 
in order to achieve power parity, seeking to prevent the achievement of hegemony 
by any one group. Therefore, two weaker rebel groups may combine forces and 
cooperate against a stronger group. This closely resembles the logic of balance-of- 
power in International Relations theory (Walt, 1985; Waltz, 1979). We apply this 
concept to sub-national actors in a civil war. Cooperation between relatively weak 
groups becomes more likely the higher the power disparity between an attacking 
rebel group and its targets. We assume only defensive cooperative behavior; groups 
only join forces in order to defend against attacks from a stronger group, rather 
than offensive in which multiple groups combine forces in attacking a larger group. 
We also assume that the costs of fighting incurred by an attacker are dependent on 
the combined strength of the cooperating target rebel groups, who may be relatively 
easy or relatively difficult to defeat. The costs inflicted by the attacker are distributed 
among the cooperating target groups inversely proportional to their strength. 


2.6 Rebel Group Recruitment 


Over time, a rebel group hires or "recruits" new rebels to increase its size and 
maintain its fighting capabilities. The exact number of rebel recruits is determined 
by the following conditions. First, the cost of hiring and retaining recruits must be 
deducted from the group's overall wealth. Therefore, the amount of wealth deter- 
mines how many rebels can be hired in the future. Second, rebels are constrained 
by recruit availability when seeking new hires. We assume a constraint that limits 
groups from recruiting new members who number more than a certain percentage 
of their current size. Rebel groups are unlikely to double or triple in size during 
the relatively short time periods of intermittent recruitment decisions, although they 
may substantially grow over the longer term. Third, the power disparity between 
rebel groups determines the rate of recruitment. The larger the power disparity, the 
slower the larger rebel group grows. Since an increase in group size increases the 
probability of winning a fight, the severity of damage to the opponent, and the costs 
of financing, growth experiences a decreasing marginal return. We assume that once 
a global power ratio of 4:1 is reached (this means that the largest rebel group has 
four times more rebels compared to the number in opposition), the hegemonic rebel 
group pauses recruitment. This choice underlines the marginal benefits a rebel group 
enjoys by growing in size. Reaching a power preponderance of this scale makes any 
increase in size minuscule as it becomes highly improbable that the remaining rebel 
groups will overpower the hegemon. Since the civil war is permanent in this model, 
the hegemon still attempts to expand and will eventually crowd out the remaining 
rebel groups. 
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3 Rebel Group Protection Rackets Model 


In this section, we describe in detail the Rebel Group Protection Rackets model 
(Sect. 3.1), introduce the model variables that define a scenario (Sect. 3.2), and 
briefly describe implementation details of the model (Sect. 3.3). 


3.1 Model Description 


Based on the assumptions presented above, we conceptualize an agent-based model 
of protection rackets in a civil war conflict zone inside a weak or failed state.! 

Here we adopt the Object Event Modeling and Simulation paradigm, which uses 
the concepts of object types, event types, and event rules for modeling discrete 
dynamic systems (Wagner, 2018). The objects of the system being modeled may 
be passive entities or active entities ("agents"), while the events that happen in 
this system may represent environment events or actions of an agent. Objects and 
events are classified by object types and event types, which may define type-specific 
properties (and operations). Event rules represent causal regularities leading to 
changes in the states of affected objects and to follow-up events. 

Figure 1 shows the information design model for the Rebel Group Protection 
Rackets model, which is composed of two object types, RebelGroup and Enterprise, 


<<eventtype>> | ^ . ««event type» 
Extort AllocateWealth 
««event type» ««event type» E 
Income Loot 


««object type» 
RebelGrou| 
wealth : Decimal 
nmrOfRebels : NonNegativelnteger | Opponent 
rebelCost : Decimal 
recruitThreshold : Decimal 
recruitRate : Decimal 

j + | extortionRate : Decimal 
extorted enterprises extorters reports : NonNegativelnteger [0.."] 
freqDemand : NonNegativelnteger | attacker 
freqExpand : NonNegativelnteger 
lastExpand : NonNegativelnteger 
lastAmountCollected : Decimal 


Enterprise 

wealth : Decimal a 
income : Decimal 

freqincome : NonNegativelnteger 
accincome : Decimal 

fleeProb : Decimal 

fleeThreshold : NonNegativelnteger 
nmrOfExtortions : NonNegativelnteger 
nmrOfLootings : NonNegativelnteger 


r : ««event type» 
««event type» RENE Report 


Flee 
extorter 


main rebel group — 0.1 


««event type» 
Fight 


««event type>> 
Demand 


««event type» 
Expand 


Fig.1 Information design model for the Rebel Group Protection Rackets model using unified 
modeling language (UML) class diagram. The * symbol represents association cardinality, e.g., in 
the main rebel group association, one RebelGroup can be the main rebel group of multiple 
enterprises, but one enterprise can have at most one main rebel group 


! Notice that this is a stylized fact model; thus, we adopt several simplifications that nevertheless 
capture the main characteristics of the phenomenon of interest. 
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and nine event types, Allocate Wealth, Demand, Expand, Extort, Fight, Flee, Income, 
Loot, and Report. 

Enterprises represent local businesspersons and liberal professionals who con- 
duct business in a territory that may be under the control of a RebelGroup. 
Enterprises aim to make enough profit to support their household by avoiding, if 
possible, the payment of extortion money. 

RebelGroups are armed groups that compete among themselves aiming to 
enlarge their territorial domain? by increasing the number of Enterprises under their 
control. A RebelGroup may have multiple Enterprises under its control (extorted 
enterprises? association) and an Enterprise can be under the control of at most 
one RebelGroup at any moment (nain rebel group association). 

A RebelGroup is composed of a number of rebels (nmrO£Rebels) that define 
the size and strength of a RebelGroup. The strength of a RebelGroup can be 
evaluated in relation to all other RebelGroups (i.e., global strength) or in relation to 
an opponent RebelGroup (i.e., relative strength). The global strength is calculated 
as the RebelGroup's number of rebels divided by the total number of rebels of all 
RebelGroups. The relative strength is calculated as the RebelGroup's number of 
rebels divided by the sum of rebels of the RebelGroup and its opponent. 

RebelGroups and Enterprises interact via a set of events that are generated 
exogenously (e.g., Demand, Expand, and Income events) or endogenously (e.g., 
AllocateWealth, Extort, Fight, Flee, Loot, and Report events). Exogenous events 
are recurrently generated based on some defined frequency. Endogenous events are 
caused by some other event. These types of events can be combined to define 
processes that are composed of an exogenous event acting as process initiator 
followed by a series of endogenous events. There are three processes in our model: 
Income Process, Demand Process, and Expand Process. 


3.1.1 Income Process 


The Income Process is composed of a single exogenous Income event. Each 
Enterprise is associated with an Income event, which causes the Enterprises to 
periodically (£reqIncome) receive an income (income) corresponding with 
their business activities. The income received increases the Enterprise's wealth 
(wealth) and its accumulated income (accIncome). The accumulated income 
represents the revenue received by the Enterprise since the last time it was extorted 
or looted, and it is used for the calculation of the extortion amount requested by the 
RebelGroups. 


?[n the model, “territory” is abstract and it does not mean a specific geographic location. 


3The Monospace font is used to indicate properties and associations presented in the information 
design model (Fig. 1). 
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For Each Enterprise 


Extort 


Extort? 


Demand 


O 


AllocateWealth End 
Demand 


Demand Process 


Fig. 2 Diagram illustrating the series of events and their interrelationships in the Demand Process 


3.1.2 Demand Process 


RebelGroups control portions of the territory and they periodically demand 
resources from the Enterprises under their control in order to support their fighting 
efforts. 

Figure 2 illustrates the series of events composing the Demand Process that is 
periodically initiated by the Demand event. 

There is one Demand event associated with each RebelGroup, which makes 
RebelGroups periodically (freqDemand) decide whether to extort (Extort event) 
or loot (Loot event) the Enterprises under their control. The decision whether to 
extort or loot is individually made by the RebelGroup for each Enterprise, but all 
Enterprises under the RebelGroup's control are extorted or looted at the same time. 
The individual decision whether to extort or loot is probabilistically based on the 
RebelGroup's global strength. Thus, weaker RebelGroups have a greater probability 
of looting the Enterprises under their control than stronger RebelGroups due to the 
greater pressure on the former to extract resources quickly to increase in size and 
become more competitive. 

If deciding to extort (Extortion event), RebelGroups request a fraction 
(extortionRate) of the Enterprises' accumulated income (accIncome) as 
extortion payment. Enterprises pay the lesser between the demanded amount and 
their wealth. This amount is transferred from the Enterprise to the RebelGroup. 
Thus, RebelGroups increase their wealth (wealth) and the value of the last amount 
collected (Last AmountCollected) accordingly, and Enterprises decrease their 
wealth (wealth) by the amount of extortion money paid, reset their accumulated 
income (accIncome) to zero, and increase their number of extortions by one 
(nmrOfExtortions). 

When looting occurs (Loot event), Enterprises transfer their total amount of 
wealth to a RebelGroup. Thus, RebelGroups increase their wealth (wealth) 
and the value of the last amount collected (lastAmountCollected) 
accordingly, and Enterprises reset both their wealth (wealth) and accumulated 
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income (accIncome) to zero, and increase by one their number of lootings 
(nmrOfLootings). Due to looting, however, an Enterprise may flee (i.e., leave 
the simulation) with a certain probability (£1eeProb)or if the number of previous 
endured lootings is greater than a certain threshold (fleeThreshold). If an 
Enterprise decides to flee (Flee event), the Enterprise is removed from the list of 
extorted Enterprises and from the simulation. 

After all demands are complete, RebelGroups reallocate their resources (Allo- 
cateWealth event). First, RebelGroups compensate their rebels, but if they do not 
have enough wealth to compensate all of them, they reduce their number of rebels 
by the difference between the wealth and the total compensation divided by the cost 
per rebel (rebelCost). 

Additionally, the RebelGroup may recruit or expel rebels if the amount collected 
since the last wealth allocation was less than the amount necessary to compensate 
the rebels. The number of rebels recruited (nmrRecruit) or expelled (nmrExpel) by 
a RebelGroup is defined by 


delta = lastAmountCollected — (nmrOfRebels x rebelCost) 
ud rebelCost , 


nmrRecruit = min (delta x (1 — globalStrength), nmrOfRebels x recruitRate) , 


nmrExpel = min (nmrOfRebels, delta) . 


3.1.5 Expand Process 


RebelGroups intrinsically try expanding their territorial control over new Enter- 
prises by fighting other RebelGroups or simply extracting resources from these 
Enterprises. 

Figure 3 illustrates the series of events comprising the Expand Process that is 
periodically initiated by the Expand event. 


O 


Fight 


O Qu 
Expand 
Report Fight? 


Flee? Flee 


Expand? 
Expand "P 


Expand Process 


Fig. 3 Diagram illustrating the series of events and their interrelationships in the Expand Process 
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There is one Expand event associated with each RebelGroup, which makes 
RebelGroups periodically (£reqExpand) choose to expand territory if they have 
rebels and there are still Enterprises not under their control. The expansion decision 
also depends on a RebelGroup's global strength and the elapsed time since its last 
expansion. Stronger RebelGroups have greater opportunity to expand their domain, 
yet weak RebelGroups increase their chance of expanding as time elapses. Thus, the 
probability of a RebelGroup deciding to expand is defined by 


expandProb (a, t, tast) = —— 
where o is the RebelGroup’s global strength, t is the current time, and fj, is the last 
time the RebelGroup expanded (lastExpansion). 

If the RebelGroup decides to expand, then it evaluates all Enterprises not 
under its control and chooses one Enterprise randomly with a probability inversely 
proportional to the Enterprise's wealth (i.e., poorer Enterprises are chosen with 
greater probability). If the chosen Enterprise is already under the control of 
another RebelGroup, the expanding RebelGroup decides whether to fight against 
the Enterprise's main RebelGroup or to simply extort or loot the Enterprise. The 
probability of a RebelGroup deciding to fight is defined by 


fightExpandProb (8) — Ix 
where f is the RebelGroup’s relative strength with the main RebelGroup of the 
Enterprise adjusted to the scale [—1, 1] using the function f(x) = 2x — 1. 

The decision to fight is unilaterally made by the attacking RebelGroup (Fight 
event). However, if the opponent is weaker than the attacker, it may form an 
alliance with another RebelGroup with a probability equal to the difference between 
the attacker's global strength and its own. If the opponent decides to form an 
alliance, the alliance is formed with the first randomly chosen other RebelGroup 
that generates an alliance stronger (sum of both RebelGroups' global strength 
forming the alliance) than the attacker. If the constraint is not fulfilled with any 
other RebelGroup, no opposition alliance is formed. 

The attacker and its opponent (a single RebelGroup or an alliance) are then 
categorized based on their global strength as the strong and the weak RebelGroups. 
The strong RebelGroup has a probability of winning the fight equal to the relative 
strength between itself and the weak RebelGroup. If the strong RebelGroup does 
not win the fight, the weak RebelGroup has a probability of winning the fight equal 
to the relative strength between itself and the strong RebelGroup. 

The fight winner receives a specific number of Enterprises from the loser. These 
Enterprises are chosen randomly among the loser's list of extorted Enterprises. If 
the winner or loser is an alliance, then the distribution of Enterprises is determined 
using the inverse of the relative strengths between the allied RebelGroups. 

In a fight, both the attacker and opponent (RebelGroup or formed alliance) suffer 
losses of rebel numbers. The loss is proportional to the relative strength of the 
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opposing RebelGroup or alliance. The loss among RebelGroups forming an alliance 
is inversely proportional to their relative strength. 

However, if the RebelGroup does not fight to expand, then the RebelGroup 
decides whether to extort or loot the chosen Enterprise with probability equal to 
its global strength. 

Enterprises extorted by a different RebelGroup than their main RebelGroup 
always report the action of the former to the latter (Report event). Looted Enter- 
prises, however, decide whether to report or flee. 

If a RebelGroup is notified that another RebelGroup is extracting resources from 
Enterprises under its control, then it decides whether to fight the invader with a 
probability defined by 


1 


fightReportProb (B, x) = ———-, 
ghtReportProb (£, x) Irc 


where £ is the RebelGroups' relative strength adjusted to the scale [—1, 1], and x 
is the number of reports that the RebelGroup has already received about the invader 
(reports). In the case of a fight, the RebelGroups interact as previously described 
and the number of reports associated with the invader RebelGroup is reset to zero. 

All RebelGroups' decisions to expand and fight are probabilistic and defined 
based on the sigmoid function of the form f(x) — Ge where x combines 
multiple factors relevant to the decision being made. The choice for this function 
was motivated by (1) its boundedness to the range [0,1]; thus, the result can be 
directly applied to make a probabilistic decision, and (2) its S-shaped curve that 
grants a smooth decision transition between extremes, yet it has an initial lag period 
of slow growth and an end period of reduced growth-rate. 


3.2 Scenario and Initialization 


A simulation scenario, also simply called scenario, specifies the initial objects, the 
initial events, and the initial values of model variables briefly described in Table 1. 

An experiment is defined on top of a simulation scenario by modifying one or 
more variable settings according to the definitions of the experiment’s parameters. 
In the initialization of the simulation scenario, the specified number of RebelGroups 
and Enterprises are created and their properties are initialized using the correspond- 
ing model variables value defined in the scenario. Exceptions are those properties 
that (1) depend on calculations like the RebelGroup’s wealth, which is set to the 
number of rebels multiplied by the cost per rebel (nmrOfRebels * RG Rebel 
Cost), and (2) depend on model variables defined as probability distributions like 
RebelGroup's £reqDemand and freqExpand, and Enterprise’s freqIncome 
and income, which are set with a number drawn from their respective model 
variable distribution settings. 
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Table 1 Model variables defining a scenario 


Model variable Description 
Simulation End Time Length of the simulation run 
Random Seed Random generator's seed 


Number Enterprises 


Number of Enterprises 


Income 


Enterprise's income 


Income Frequency? 


Frequency Enterprises receive income 


Flee Probability" 


Probability for looted Enterprises to flee 


Flee Threshold* 


Number of loots Enterprises endure before fleeing 


Number Rebel Groups 


Number of Rebel Groups 


RG Size? 


Initial number of rebels per Rebel Group 


RG Prop. Enterprises? 


Proportion of the initial number of Enterprises per Rebel Group 


RG Extortion Rate? 


Proportion of the Enterprise's accumulated income to extort 


RG Rebel Cost? 


Cost per rebel 


RG Recruit Threshold? 


Threshold of the Rebel Groups' strength to stop recruiting 


RG Recruit Rate? 


Maximum size increase per wealth reallocation 


RG Demand Frequency» 


Frequency Rebel Groups decide racketeering Enterprises 


RG Expand Frequency* > 


Frequency Rebel Groups decide expanding 


Fight Expansion 


Number of Enterprises transferred from loser to winner of fight 


“List with the same number of elements as the Number Rebel Groups 
value is the mean and standard deviation defining a normal distribution 
* Value is the minimum and maximum values of a uniform distribution 


3.3 Implementation 


We implement our model using the JavaScript-based Object Event Simulation 
framework OESjs,* which is an open-source web-based simulation platform that 
allows publishing simulation models on the Web and running them in any modern 
web browser (Wagner, 2017). The Rebel Group Protection Rackets simulation is 
available online at https://gnardin.github.io/RebelGroups and the source-code is 
available at https://github.com/gnardin/RebelGroups/releases v1.0. 


4 Experiments 


This section describes and examines two kinds of civil war experiments, one 
hypothetical related to security factors (Sect. 4.1) and another resembling the 
conditions of Somalia since 1992 (Sect. 4.2). 


^More information about OESjs is available at https://sim4edu.com. 
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4.1 Security Experiments 


We conduct experiments aimed at explaining the dynamics of rebel fighting, their 
impacts on the economy, and the importance of rebel groups’ economic bases of 
support to their sustainability in the context of a civil war. In particular, we analyze 
how security factors influence the dynamics of a civil war by varying the initial 
strength (Sect. 4.1.1) and initial allocation of Enterprises (Sect. 4.1.2). 

The hypothetical scenarios in these experiments use combinations of parameter 
values that represent real-world inter-rebel group configurations. These scenarios 
are based on 3-actor settings in which three different rebel groups strive for 
hegemony. Each experiment scenario varies just one parameter value in relation 
to the baseline, whose values are shown in Table 2. These values were chosen for 
demonstration purposes with the exception of rebel costs, for which there exists 
research on the upper and lower boundaries of rebel group expenses as explained in 
the following paragraphs. 

To estimate rebel costs, we neglect the start-up costs of rebel groups and 
focus only on maintenance costs. Wennmann (2009) calculates different cost 
ranges depending on the intensity of the conflict. We use present-day Somalia 
as an example of medium-intensity conflict and apply Wennmann's estimations 
accordingly. The estimated cost to recruit 1000 rebels ranges between US$2.5 
million and US$16.2 million. Al-Shabaab, the current primary militant organization 
in Somalia, numbered between 7000 and 9000 rebels in 2014 (BBC News, 2017). 
Because weapon supplies and maintenance costs are relatively inexpensive in this 
case, we can estimate maintenance costs as US$3.5 million per year. Thus, one rebel 
should cost US$291 per 30-day period.? 


Table 2 Baseline input parameter values to the experimental scenarios 


Parameter Value Parameter Value 
Simulation End Time 365 Number Rebel Groups 3 
Number Enterprises 3000 RG Size? 300 
Income (200,50)? RG Prop. Enterprises* 33% 
Income Frequency 1 RG Extortion Rate* 10% 
Flee Probability 50% RG Rebel Cost* US$291 
Flee Threshold 3 RG Recruit Threshold* 80% 
RG Recruit Rate* 50% 
RG Demand Frequency* (30,2)? 
RG Expand Frequency* (30,2) 
Fight Expansion 50 


*List with number of elements equal to the Number Rebel Groups with all element values the same 
bMean and standard deviation defining a normal distribution 


SIn the future, these values may be modified to match other empirical cases. 
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All experiment scenarios are replicated 100 times,° each replication using one 
different random seed. All experiment scenarios use the same ordered set of random 
seeds. Specifying the random seed permits obtaining the same results as presented 
in this study, as otherwise probabilistic processes result in deviations. 

The analyses of the experiment scenarios are based on a set of output variables: 
Number of Extortions, Number of Lootings, Number of Expansions, Number of 
Alliances, Number of Expansion Reports, Number of Fled Enterprises, Number 
of Recruitments, Number of Rebel Expels, and Number of Rebel Deaths. The 
results are calculated as the mean and standard deviation of the results of the 
100 replications performed for each experiment scenario. Subsequently, we use 
the Wilcox Rank-Sum test (see Wilcoxon, 1945) to identify whether the change 
in scenario configurations entails statistically significant effects on the output 
variables. We chose the Wilcox Rank-Sum test because we cannot assume that the 
output values are normally distributed. 


4.1.1 Rebel Group Strength 


The first experiment evaluates the influence of different initial power distributions 
among Rebel Groups, ranging from an equally balanced to a more hierarchical 
setting. The experiment is composed of three scenarios that differ in the initial 
number of rebels of each Rebel Group (see RG Strength in Table 3). In scenario 


Table 3 Results (mean and standard deviation) of the Rebel Group Strength experiment that varies 
the initial power distributions among Rebel Groups, ranging from an equally balanced (RGS1), to 
one powerful Rebel Group (RGS2) to a more hierarchical setting (RGS3) 


Rebel group strength (RGS) experiment scenarios 


RGSI RGS2 RGS3 
RG Strength? [300, 300, 300] [300, 150, 150] [300, 150, 75] 
Number of Extortions 3651 + 1108.21 3821 + 1406.26 4043 + 1337.99 
Number of Looting 5122 + 203.76 5108 + 230.18 5086 + 233.04 
Number of Expansions 19 + 4.02 19 + 3.89 19 + 4.53 
Number of Alliances 0.05 + 0.22 0.08 + 0.27 0.1 x: 0.3 
Number of Fights 9.8 + 2.32 9.55 € 2.17 9.62 + 2.58 
Number of Expansion Reports 6.45 + 2.95 6.64 + 2.60 6.27 + 2.72 
Number of Fled Enterprises 2851 + 127.63 2842 + 149.19 2823 + 148.75 
Number of Recruitments 2197 + 908.77 1574 + 687.67 1413 + 567.98 
Number of Rebel Expels 634 + 531.22 407 + 408.46 356 + 384.31 
Number of Rebel Deaths 2199 + 619 1531 + 477.43 1329 + 419.18 


? Values define the RG Size for Rebel Groups 1, 2, and 3, respectively 


$We also conducted the experiments with 1000 simulations but the results remained almost 
identical. 
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RGS1, Rebel Groups have equal strength, measured by the number of rebels 
associated with them. In scenario RGS2, one Rebel Group has twice the strength of 
each of the other Rebel Groups. Finally, scenario RGS3 invokes a stricter hierarchy 
between the Rebel Groups in which the strongest Rebel Group is twice as strong as 
the second strongest, and the second strongest is twice as strong as the weakest. The 
selection of these configurations is theoretically motivated, providing insight on the 
impacts of changes in the variation of individual parameters. 

The results suggest that Rebel Group size exerts varying impacts on security and 
economic outcomes. The number of alliances and fights remain almost identical 
in all three scenarios and do not significantly differ across the different scenarios. 
However, recruitment rates are highest for the equal distribution scenario (RGS1), 
which indicates that power competition is most intense when no Rebel Group 
dominates from the outset. The Wilcox Rank test is statistically significant for both 
comparisons RGS1 and RGS2 (p < 0.01) and RGSI1 and RGS3 (p < 0.01). The 
number of expelled rebels follows the same trend and is also statistically significant 
for both comparisons (p « 0.01). Expelling occurs when the costs of maintaining a 
large dominant force are too high to bear and capturing enterprises from opponent 
rebel groups does not pay for its maintenance. Due to fewer recruitments in the 
hierarchical scenarios (RGS2 and RGS3), the expel numbers fall. Further, the 
average death rates in the various scenarios differ by a wide margin of almost 
1000 fewer deaths in the RGS3 scenario compared with RGS1 (p < 0.01). The 
high intensity of power competition in the first scenario translates into fights with 
more fighters and hence more fatalities. This is reminiscent of currently ongoing 
civil wars in Libya or Syria in which the involved rebel group sizes are sufficiently 
high to perpetuate and intensify the respective civil wars. Lastly, expansion rates 
do not differ and remain constant at 19 expansion events on average for each 
scenario, indicating that similar relative strength differences are reached in all 
different configurations after a short period of time. By contrast, extortion increases 
and looting decreases in more unequal scenarios but the results are statistically 
inconclusive. 


4.1.2 Enterprise Allocation 


This experiment evaluates the effects of different initial allocations of Enterprises 
among Rebel Groups, from a balanced to a more unbalanced distribution (see 
Allocation Proportions in Table 4). This experiment is composed of three scenarios 
in which we vary the initial number of Enterprises under the Rebel Groups’ control. 
In scenario EA1, each Rebel Group starts with approximately the same number of 
Enterprises. In scenario EA2, Rebel Group | has twice as many Enterprises than 
the other two Rebel Groups. Finally, in scenario EA3, Rebel Group 1 initially has 
control over almost two-thirds of all Enterprises, while the Rebel Group 2 and Rebel 
Group 3 have initially thirty and 10% of Enterprises under their control, respectively. 
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Table 4 Results (mean and standard deviation) of the Enterprise Allocation experiment that 
varies the initial allocations of Enterprises among Rebel Groups, from a balanced (EA1) to a 
powerful Rebel Group (EA2) to a more unbalanced and hierarchical distribution (EA3) 


Enterprise allocation (EA) experiment scenarios 


EAI EA2 EA3 
Allocation Proportions? [3496, 33%, 3396] | [5096, 25%, 2596] | [60%, 30%, 10%] 
Number of Extortions 3651 + 1108.21 3466 + 1052.23 3836 + 1239.10 
Number of Looting 5122 + 203.76 5163 + 169.03 5124 + 180.96 
Number of Expansions 19 + 4.02 20 + 3.97 19 + 3.36 
Number of Alliances 0.05 + 0.22 0.05 + 0.22 0.04 + 0.20 
Number of Fights 9.8 + 2.32 9.82 + 2.39 9.54 + 2.24 
Number of Expansion Reports | 6.45 + 2.95 6.72 + 2.85 6.21 + 2.17 
Number of Fled Enterprises 2851 + 127.63 2874 + 108.96 2852 + 123.22 
Number of Recruitments 2197 + 908.77 2037 + 783.10 1714 + 810.87 
Number of Rebel Expels 635 + 531.22 623 + 545.54 575 + 471.30 
Number of Rebel Deaths 2199 + 619 2084 + 519.54 1793 + 502.07 


*Values define the RG Prop. Enterprises for Rebel Groups 1, 2, and 3, respectively 


The results show that the variation in the output for extortions, recruitments, 
and deaths is statistically significant in the comparison between EA2 and EA3 
(p < 0.05) (see Table 4). Since EA1 is not statistically different to each scenario 
(EA2 and EA3), the argument that there can be a trend discerned in the distribution 
of enterprises cannot be corroborated. Instead, EA3 appears to constitute an 
idiosyncratic scenario which reports the lowest deaths and recruitment rates due 
to its highly unequal access to revenues for rebels. The skewed allocation of 
enterprises between the different rebel groups put the weakest rebel group at such 
a disadvantage that fights are less intensive with lower casualty rates. In turn, this 
does not translate in fiercer competition between the more affluent rebel groups 
as evident by the low number of recruitments. Having a weak rebel group from 
the outset changes the dynamics of a civil war by reducing the expected average 
death toll compared to conditions in which revenue collection is sufficient for each 
rebel group to engage in active violent competition. From corporate perspective, 
this scenario (EA3) is most desirable compared to the other two scenarios as higher 
extortion rates are tantamount to longer survival during the civil war (which is not 
equal to survive the entire civil war as fleeing rates are very similar). 


4.20 Somalia Case Study 


We now consider the dynamics of a case that resembles the Rebel Group config- 
uration of a historical example: the civil war conditions of Somalia since 1992. 
Somalia is an applicable case due to its relative level of anarchy since the collapse 
of a longstanding dictatorship (Leeson, 2007; Powell et al., 2008). 
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4.2.1 Historical Background 


Somalia was led by dictator Mohammed Siad Barre until the Somali Civil War in 
1991, in which popular dissatisfaction with Barre's regime resulted in his overthrow. 
As a result of longstanding interclan tensions that Barre had fueled in his attempt to 
maintain his grip on power, the country exploded into multiple factions competing 
among one another for power and spoils. This led to a humanitarian disaster, 
prompting interventions by the United Nations and the USA (Baumann et al., 2004). 
However, by that time many militarized groups had proliferated and established 
bases of support in different areas of the country. The northern region of the country 
consolidated under the umbrella government of Somaliland, and the southern capital 
of Mogadishu collapsed into disarray (Leeson, 2007; Powell et al., 2008). 

Many organizations emerged in the southern competition for power, some of the 
most prominent including the United Somali Congress/Somali Salvation Alliance 
(USC/SSA), the United Somali Congress/Somali National Alliance (USC/SNA), 
the Southern Somali National Movement (SSNM), the United Somali Party (USP), 
the Somali National Front (SNF), the Somali Asal Muki Organization (SAMO), the 
Somali Patriotic Movement (SPM), and the Somali National Union (SNU) (Bau- 
mann et al., 2004). Other small, religious-based militias also emerged, alongside 
local clan militias. Following the attack on U.S. Army personnel during a raid 
against militia leader General Mohamed Farah Aideed of the USC/SNA that killed 
nineteen US soldiers, the USA withdrew its troops and numerous attempts by 
the international community to broker peace subsequently failed (Baumann et al., 
2004). 

While the northern provisional government of Somaliland presided over a 
shaky peace, the rest of Somalia remained mired in overall instability and virtual 
anarchy (Leeson, 2007; Powell et al., 2008). In 2006 the Islamic Courts Union 
(ICU) managed to consolidate control over most of southern Somalia and implement 
Sharia law, unifying much of the country and establishing a degree of security. How- 
ever, opposition organizations aligned with the transitional government challenged 
the ICU and fought with the support of the USA and Ethiopia, ultimately leading 
to ICU's withdrawal and defeat. Radical elements of the organization splintered 
off, forming new militant organizations such as the al-Qaeda-linked al-Shabaab. A 
new coalition Federal Government of Somalia (FGS) was formed with the support 
of foreign states, but this government has continually struggled against attacks by 
al-Shabaab, which holds substantial territory and at one point even controlled the 
capital city of Mogadishu (Ahmad, 2015). 

With the help of international military support, al-Shabaab's territory has been 
significantly reduced since 2012, yet it regularly carries out bombings and makes 
grabs for territory. Violence also continues to occur between rival clans and sub- 
clans, and other armed militia groups. Due to the ongoing conflict, much if not most 
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of the country lives under relative anarchy of federal or local government control. 
These conditions make Somalia a prime case for modeling economic opportunism 
in an atmosphere of insecurity. Al-Shabaab engages in extortionist activity against 
local businesses in order to finance its war-making activities, and it uses its 
violent attacks to threaten, intimidate, and punish civilians in order to continue this 
extortion (Ahmad, 2015). As a result, civilians often flee to neighboring countries, 
creating a continual refugee crisis. Al-Shabaab also engages in sophisticated infor- 
mation campaigns to recruit new fighters, facilitating its expansion and continual 
battles against the Western and Ethiopia-backed government. 


4.2.7 Data and Experimentation 


We perform an experiment simulating three scenarios that each represents one of the 
three different stages in Somalia's conflict: first, a system with nine Rebel Groups 
of the same size and characteristics, in equal competition for power; second, a 
system with nine Rebel Groups but with varying sizes and extortion populations; 
and third, a system with one primary strong Rebel Group among many much smaller 
groups. The values used in the model are based on published estimations of group 
size and extortion of local populations, as well as other rebel and demographic 
characteristics (Abbink, 2009; Clarke, 1992; UNFPA, 2016). Each experiment 
scenario varies the RG Size and RG Prop. Enterprises values, while all other values 
are fixed as shown in Table 5. 

All experiment scenarios are replicated 100 times, each replication using one 
different random seed. The analyses of the experiment scenarios are based on a set 
of output variables whose values are calculated as the mean and standard deviation 
of the results of the 100 replications performed for each experiment scenario. 

The first scenario (SO1) represents the period immediately following the collapse 
of the Barre regime, in which multiple groups increased in size and capability in the 
fight against Barre but had not yet dominated one another. The values for all 9 Rebel 


Table 5 Somalia baseline input parameter values to the experimental scenarios 


Parameter Value Parameter Value 
Simulation End Time 365 Number Rebel Groups 9 
Number Enterprises 3000 RG Extortion Rate? 1096 
Income (200,50)? RG Rebel Cost* US$291 
Income Frequency 1 RG Recruit Threshold* 80% 
Flee Probability 50% RG Recruit Rate? 50% 
Flee Threshold 3 RG Demand Frequency? (30,2) 
RG Expand Frequency? (30,2)? 
Fight Expansion 1 


?List with number of elements equal to the Number Rebel Groups with all element values the same 
^Mean and standard deviation defining a normal distribution 
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Group sizes are 500, and 11% for all proportion of Enterprises. The second scenario 
(SO2) represents the conditions approximately 1 year later once different groups 
began to compete with one another. The USC/SSA and the USC/SNA emerged as 
the two primary rival organizations, while the others maintained smaller niches in 
the fight as well. Here, the first two Rebel Groups have a size of 500 members, while 
the remaining are comprised of only 200 members. The proportion of Enterprises 
per Rebel Group are as follows: 18%, 18%, 1%, 14%, 4%, 6%, 14%, 13%, and 
3%. These values are determined by the population distributions of the clans that 
each group claims to represent (Baumann et al., 2004). This assumes that the Rebel 
Groups each extort only from within their own clans. 

The third scenario (SO3 and SO3*) represents a post-ICU Somalia in which al- 
Shabaab is the primary extortionist Rebel Group, while many other small clan-based 
militias also compete locally for power. In this case, one Rebel Group is a size of 
7000, and the others are all a size of 200, in order to represent the approximate 
empirical proportionality of different group sizes. The proportion of Enterprises for 
the first group is 50%, and the remaining are each 6.25%. 

The results shown in Table 6 indicate most clearly that observers should expect 
more extortion to have taken place as power in Somalia became consolidated by 
a fewer number of groups. This is consistent with our theory that without peer 
competition, rebel groups are most likely to establish a stable and widespread 
racket system. In addition to increased extortion, consolidation should also lead to a 
decrease in other violent activity such as looting and fighting. Although we observe 
slight decreases in looting and fights across our experiments, these trends are not 


Table 6 Experiment results of the Somalia case study 


Somalia (SO) experiment scenarios 


SOI SO2 SO3 SO3* 
Group sizes and proportions | All equal Two stronger | One Hegemon | One Hegemon 
Number of Extortions 641 + 32.98 8621: 75.44 | 1776 + 128.64) 2958 + 913.72 
Number of Looting 5257 + 46.05 | 5256 + 45 5259 + 45.36 | 5178 + 130.12 
Number of Expansions 49 + 4.45 47 + 4.07 49 + 4.18 47 + 6.96 
Number of Alliances 0.68 +0.75 |0.58 +0.70 |0.67 0.78 | 0.43 + 0.78 
Number of Fights 24 + 4.30 2544 23 + 4.04 24 + 3.99 
Number of Expansion Reports | 14 + 4.01 13 x 3.53 15 x3 13 + 3.84 
Number of Fled Enterprises | 2999 + 1.03 |2996 +5.95 |2999 +5.03 | 2938 + 92.49 
Number of Recruitments 2395 + 307.05) 2481 + 438.79 | 1497 + 186.73 | 6172 + 1021.36 
Number of Rebel Expels 3233 + 560.02| 1778 + 370.18 | 8253 + 343.35 | 8431 + 722.15 
Number of Rebel Deaths 3660 + 458.63 | 3096 + 379.91 | 1841 + 291.42| 6008 + 773.53 


SOI represents the period immediately following the collapse of the Barre's regime. SO2 represents 
the conditions approximately 1 year later once different groups began to compete with one another. 
SO3 represents a post-ICU Somalia in which al-Shabaab is the primary extortionist Rebel Group, 
while many other small clan-based militias also compete locally for power. SO3* is similar than 
SO3, except that it represents the presence of substantial outside funding comprising about 80% of 
Rebel Group's revenue 
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significant due to large standard deviation values. We would otherwise expect that, 
due to the high number of relatively weak groups in scenario SOL, every group 
should have an incentive to increase its power quickly and primarily loot rather than 
extort. We observe a very low level of alliances in any of the scenarios, possibly due 
to high plurality of competition. Overall, these results support our expectations that 
a greater level of extortion is reached once one group has achieved hegemony, but 
adjustments to the model may be necessary to lower standard deviations such that 
other expected patterns become more significant. 

A notable pattern emerges during scenario SO3 in which the largest group 
quickly shrinks in size, with its number of fighters becoming more similar to that 
of its competitors. Extortion profits are insufficient to continue allocating pay to the 
proportionally high number of fighters used to represent al-Shabaab in this experi- 
ment. Therefore, after only a short period of time, the large group is unable to sustain 
itself. It does not remain a hegemon, possibly causing an unexpectedly high number 
of observed incidents of looting, fighting, and fled Enterprises in SO3. This may also 
be the cause of high numbers of rebel expels and rebel deaths. In reality, a dominant 
group like al-Shabaab receives a substantial portion of its income from external 
sources, such as foreign remittances and overseas benefactors (Keatinge, 2014). It 
also engages in its own trade and business practices in order to raise revenue. These 
sources of income, which likely enable al-Shabaab to sustain its large number of 
fighters and widespread activities, are not accounted for in our model. 

Therefore, we perform a second simulation of the third scenario (SO3*) in which 
all parameters remain the same as in SO3, except in order to represent the presence 
of substantial outside funding comprising about 80% of revenue, we lower the cost 
per rebel fighter for the hegemon from US$291 to US$50." Our results in SO3* 
show even greater extortion activity in a system with other revenue sources; the 
number of extortions is significantly higher than those in the first three scenarios. 
The number of lootings and fleeing Enterprises is slightly lower, but these results 
fall within the margin of error and are not statistically significant. Further, the 
number of fights and the number of Rebel deaths do not decrease as expected in 
a system with greater stability and control under one hegemon. The model captures 
greater financial stability, but not greater securitization. These results are consistent 
with theories that external funding to rebel groups actually increases violence levels 
and conflict severity (Fearon, 2004; Weinstein, 2006). This impact would therefore 
counteract economic benefits and explain the lack of decrease in looting, fighting, 
and deaths. 

Data from the Uppsala Conflict Data Program (UCDP) show the highest numbers 
of civilian and combatant deaths in Somalia at the height of its collapse into civil 
war in 1991 and 1992 (Croicu and Sundberg, 2017; Sundberg and Melander, 2013). 
Conflict during this time period, in which a multitude of rebel groups competed 
among themselves for power, resulted in a critically higher number of refugees 


78096 is a rough approximation used to represent outside funding as a substantially larger source 
of income than extorted funds. 
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fleeing looting and violence at the beginning of the war than later on (Leeson, 2007; 
Powell et al., 2008). After 2008, once the Islamic Courts Union (ICU) achieved 
a monopoly of control over most areas of the country, violence ceased. However, 
fighting shortly resumed at a lower but consistent level as foreign-backed groups 
rose to oppose its power and al-Shabaab emerged, using violence to try to reestablish 
order and intimidate its enemies. Since its inception the group managed to maintain 
a steady extortion racket in areas under its control (Keatinge, 2014), which is 
consistent with the results of our model. Although the results do not capture the 
significant decrease in violence and fleeing from early levels, they do represent how 
extortion increases with consolidation of power over time. 

Our results are also limited in that they provide no insight into the stability 
or security of the internal practices of a hegemonic group. For example, despite 
a monopoly on extortion and violence by al-Shabaab in many communities of 
Somalia, the violent practices exercised by the internationally condemned terrorist 
organization such as intimidation for extracting extortion, strict policing and law 
enforcement that utilizes corporal punishment, and acts of terrorism result in 
an overall instability for civilian residents. Further, their monopoly on power is 
inconsistent, and clashes rise and fall in frequency as al-Shabaab loses and regains 
control of territory. Hegemonic control therefore does not necessarily equate to 
overall improvement in stability and security. However, our model does demonstrate 
some of the immediate effects of political contestation between different numbers 
of groups in civil war. 


5 Conclusion and Discussion 


The experiments using our agent-based simulation model indicate that unequal 
power distribution between rebel groups leads to less fighting, less rebel recruitment, 
and lower death rates in a civil war. Unequal enterprise allocation similarly affects 
extortion, with the highest incidence rate when one hegemon overwhelmingly 
dominates. Ultimately, wars with unequal power distributions may be more likely 
to end faster, with fewer fatalities. This finding supports existing research on the 
effects of rebel group strength parity, which may impede conflict termination due to 
miscalculations and bargaining failures (Humphreys, 2005, p. 504). It contradicts, 
however, findings that parity could lead to earlier termination due to the prolonged 
costs of a stalemate (Zartman, 2000). We also find that a more hegemonic system 
results in greater economic stability. This does not necessarily result in greater 
security, as demonstrated by the Somalia case study. Other factors such as external 
funding of rebel groups and the brutality of their internal practices may cause violent 
activity despite consolidation of extortion practices. 

Our agent-based model provides an important methodological supplement to 
existing studies of conflict that use quantitative statistical analyses and qualitative 
case studies. As civil wars are highly complex dynamic systems, the evaluation 
of individual parameters and their impacts on civil wars must be understood 
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and investigated against the backdrop of continually changing environments. The 
simulation experiments provide us with expectations for the trajectories of conflicts, 
depending on various economic and security conditions. The inclusion of prob- 
abilistic functions allows us to account for random variation or error that could 
capture the unpredictability of real-world events. We encourage interested readers 
to explore various initial settings to develop an understanding of civil war dynamics 
based on the assumptions laid out in this study (executable at https://gnardin.github. 
io/RebelGroups). 

Future adjustments to the model could include incorporating external influences 
such as foreign military intervention or outside sources of rebel funding. The 
difference in results between scenarios SO3 and SO3* in the Somalia experiment 
demonstrates the potential significance of this addition. Another potential improve- 
ment could be the inclusion of spatial data. The model currently does not account 
for "space" or "territory," which are instead represented through the allocation 
of Enterprises. Spatial data could be used to place restrictions on Rebel Groups' 
directions of expansion and initiate balancing. Lastly, since the programming code 
remains open-source, interested readers can engage in modifications and alter our 
assumptions. For instance, event types like fighting or expansions can be modified to 
reflect different understandings of these behaviors. New event types can be created 
that are not included in the model (e.g., intervention or the provision of foreign aid). 
Parameters can also be thought to represent rebel group characteristics. For instance, 
religious and ideologically based rebel groups like Al-Shabaab might have higher 
expansion rates compared to ethnic based rebel groups in Somaliland. 

Apart from its scientific contributions to the field of conflict studies, agent-based 
models of civil wars have the strong potential to contribute to informed policy- 
making by allowing users to predict the range of possible ways in which a civil 
war can evolve. Their usefulness is dependent on successful gathering of data to 
reflect realistic conditions as closely as possible. It could potentially serve as a 
useful tool for political decision-making, as model results may provide clues for the 
expected intensity of wars and their possible impacts on local economies in conflict 
countries. For instance, an adjusted model that takes foreign influences into account 
could attempt to simulate the potential outcomes of a currently looming civil war 
in Venezuela. Using information on the strength of potential rebel groups based on 
the opposition as well as specifying additional autonomous actors like the army and 
local groups (colectivos) would allow one to simulate the impact of decisions like 
the provision of military aid to either group on ongoing civil war dynamics. 
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Using network event sequences (where each tie between a sender and a target in a 
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Link: https://github.com/tamos/ValuesInText 


Chapter 6: Agent-Based Simulation Model Simulating 
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as a workload manager. The online documentation provides an example for running 
a basic simulation locally and showcases its output graphically. 
Link: https://github.com/JuKo007/SimulatingNormativeConflict 


Chapter 7: Agent-Based Simulation Model ProtestFate 


A NetLogo model for the coevolution of protest activation and topic selection 
online and in the streets authored by Jan Lorenz, Ahmadreza Asgharpourmasouleh, 
Masoud Fattahzadeh, and Daniel Mayerhoffer. 

Link: https://doi.org/10.5281/zenodo.3243818 
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Groups’ Attack Timing 
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of ties between non-state armed groups (NSAGs) varies. Code to statistically 
analyze empirical attack data with a multivariate Hawkes process and the resulting 
network of ties between NSAGs. 

Link: https://github.com/adamrpah/BIGSSS- Terror 
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Replication code and links to relevant datasets. 
Link: https://github.com/AndrSalvi/onthebeatenpath 


Chapter 10: Replication Code to Analysis of Conflict Diffusion 
over Continuous Space 


Replication code and links to relevant datasets. 
Link: https://github.com/ckelling/conflict_diffusion 


Chapter 11: Agent-Based Simulation Model Rebel Group 
Protection Rackets 


The agent-based simulation model Rebel Group Protection Rackets attempts to 
capture the decision-making and behavior of the involved actors during protection 
racket interactions as well as the cooperation and competition between rebel groups 
to control territory. The model reveals insights about the mechanisms that are helpful 
for understanding violence outcomes in civil wars, and the conditions that may lead 
rebel groups to prevail. Analysis of various scenarios demonstrates the impact that 
different security factors play on civil war dynamics. 

Source-code Link: https://github.com/gnardin/RebelGroups 
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